When looking at scanned print comics, one thing usually sticks out. Unless the scanner takes great care, each page will differ in size just a little bit. Scanners use Fit Width to make sure all the pages are the same width, but unless your scan is perfect (or you go back through later to crop for consistency, though that has problems of its own), you’re looking at pages that may be sized 1280×1028, 1280×1020, or 1280×1030, as a few random examples. It’s not a huge deal for the reader, and really you’ll only notice if you’re paying close attention, which I imagine is why this is generally true of comics that were physically scanned. There are a few other things that are specific to print scans, too, like the occasional hair that got scanned in, artifacts, evidence of where someone joined a two-page spread, and moire.
The new hotness are digital scans. Common sense suggests that the scanners take the digital comics themselves, strip them out of the reader, and then package them up. They don’t just use the iPad screenshot function, either. The scans are higher quality than that, and aren’t subject to the brightness setting on the iPad. They’re also of a uniform resolution — a recent digital scan of Daredevil is 988×1500 throughout, save for a two-page spread and one other exception. The recap page is 995×1500. Ultimate X-Men #7 is 1280×1943. Avengers 22, which wasn’t even released digitally but was released as a digital scan, is 1280×1944.
It’s possible that these are just print scans, sure, but not likely. I’ve been talking through this conundrum with David Uzumeri for a couple of weeks now. We’re both interested in the technology behind how this works, if only for curiosity’s sake. We got our Nancy Drew on and found something interesting. We’re pretty sure that the digital scans of Marvel’s comics aren’t being scanned by who or how you’d think they are.
Print scans tend to be around 150dpi or higher, for the sake of image quality. The recent digital scans I’ve looked through have been 72dpi. Most of them have been created using Adobe Photoshop CS3 for Windows, though a couple scanners use CS5. There’s an aura of perfection around these scans that makes it unlikely that someone is just posting print comics with a digital tag for the sake of shenanigans.
Where do these they come from? They’re not iPad screenshots. I don’t think it’s someone taking screenshots off ComiXology, either. The images are too clean and too perfect for that, plus ComiXology’s web reader sucks. Good luck getting anything readable out of that thing. Edit: Several people have pointed out that it’s actually really easy to pull images from ComiXology using simple functions that are built into your web browser. I tried it out and yeah, man, I was totally wrong there. My bad.
A clue. Here’s the print cover to Daredevil 9 and the digital scan cover right after it.
They look fine, right? Both are totally reasonable covers, and the lack of UPC feels right for digital comics. The rub, though is that digital comics have a copyright notice on the cover, every single time. “©2012 Marvel Characters, Inc. All rights reserved. WWW.MARVEL.COM.” The digital scans don’t have that warning, and show no sign of it having been photoshopped out.
Here’s the raw cover from the solicitations:
Now, unless the scanners are carefully photoshopping in the logo each time (they aren’t, don’t be ridiculous), then something’s up. The plot thickens when you realize that the font and placement of the credits on the cover on the legit digital version differ from the digital scan. It doesn’t differ a lot, usually, but the fonts are visually different and sometimes the credits are off by a few inches. In this case the credits are different.
There’s a chance the scanners could have figured out how to hack ComiXology to dump the pages, but would still require buying a whole lot of comics once they go live and processing them immediately. Considering how small the digital scan groups are, that’s pretty unlikely. All of the DC scans slowly trickle in after 2pm EST, their official release time. Scanners aren’t likely to obey idiotic online street dates, and they can’t scan books without buying them. That limits them to ComiXology’s release time.
All the Marvel books arrive at once, though, and in pristine condition. There’s something undeniably fishy there. My first thought was that people were scanning Marvel comics they got on Tuesday at their comic shops, but if that were true, we’d see DC and Image books following that pattern, too. My second thought, and one that seems more reasonable and likely, is that there’s a leak somewhere in the supply chain.
Taken all together: There are covers that differ from any legit cover. Pages that are pitch-perfect. DPI that matches across the board. Recap pages that occasionally vary in size, just like they do in official Marvel electronic review copies. Every Marvel comic is available early and all at once on the scan site du jour. DC comics appear in a trickle after 2pm EST. By 7pm EST on Wednesday, every big two comic is available for download, but well before that, Marvel’s entire line-up for the week is ready to go.
It’s pretty clear from this evidence that there’s a leak somewhere along the supply chain. Someone’s getting access to a PDF, or something, and dumping it to JPG before releasing it to the net. Converting a PDF to a series of JPGs is simple in Photoshop, and once you set up a good action to save the images, this is something that takes no more than five minutes to do, RAM and size of PDF depending.
The PDF thing is easy to prove, due mainly to the janky fonts on the covers and in the issue. Whatever tool the scanners use to dump the jpegs doesn’t actually have the fonts the comics require, so we get a next-best and unobtrusive replacement. They use InDesign to dump, is my guess, and then Photoshop to re-size. You can actually see this error at work in all of DC’s preview comics, because the price and issue number are incorrect. (quick edit: check the comments for something I screwed up on the PDF front, though it doesn’t really change the thrust of the post…)
Actually, quick sidebar: DC’s preview images are enormous in size, usually weighing in at a megabyte a page or more. DC’s doing no post-processing on their previews, basically, so the pages are too large to actually read in a web browser comfortably, too high-res to be worth saving, and clear enough to see all of the weird PDF signatures that books have before they go to print. Dear DC Comics: you gotta do better with that. It’s embarrassingly amateur. Drop it to like 1280 on the long side and maybe 350 or 500kb max per page. Or, y’know, look at how every comics site reformats your previews and format it like that.
Anyway, if the DC thing doesn’t convince you, Uzumeri found a smoking gun. The photo is from the issue of Uncanny X-Men he bought on Wednesday. The clean image is from the scan.
The scan is missing the musical notes, which are presumably some type of font that the scanners do not have access to, or maybe a layer that was missed out of the source of the scan.
We found another gun in Daredevil 9. The captions on this page are from the following page. Daredevil is underground and tracking the Mole Man. The captions have nothing to do with the Black Cat, though it is funny how they almost work with the scene, at least in terms of how they’re positioned on the page.
Or this other other gun, in Winter Soldier 1. There’s a scene that’s out of place in the digital scan. Pages 11 and 12 of the digital comic come before page 7 in the digital scan. It’s a mistake that’s easy to make, but there’s a subtle transition between pages 6 and 7 in the digital comic that show it as a definite mistake. This is curious, because if you’re dumping a PDF, all the pages come pre-numbered. Did someone have a bum PDF or InDesign file?
The clean covers begin to make sense now, too. If I had to guess, I would assume that the copyright, credits, and UPC are separate layers in one file. When they export to print or digital, they can tick a box and show the UPC or copyright, depending on the requirements of the situation. Cover elements like the credits can be maneuvered around pretty easily, but the scans always have them near where they are on the printed comic. Actually, looking at the Daredevil cover… the credits and Marvel logo are terribly placed. They’re high enough that something should go below them. What is THAT about?
So, who is it? Who’s got the PDFs?
Who it isn’t:
Fans: The fans who scan use printed comics, or have figured out how to dump ComiXology’s images (maybe dipping into wherever Flash stores its cache?). This is too perfect for that, and the little problems that crop up are unlikely in that situation.
Retailers: I asked around and spoke to a retailer. Retailers do get electronic preview copies on their retail resource site, but strictly at Marvel’s whim. The last one they got was was Chris Yost and Ryan Stegman’s Scarlet Spider #1 around a month before that came out. Right now, there are no previews on the page. No previews mean no scans. Nah son.
Press: Marvel doles out preview PDFs on Thursdays, but they just have eight pages. When Marvel wants to use you to pimp some new comic of theirs, they’ll shoot you a link to a full PDF. The sheer volume of books available, though, suggests that it ain’t the press. Marvel’s got no reason to flood the press with books when the press is more than happy to review every Marvel comic every week.
ComiXology: This was my first guess, actually. They’d have access to the files, and they messed around and released Justice League several hours early, enabling scanners to get it up before it even came out last year. But the files are different than the actual ComiXology files in very specific ways. ComiXology would probably have the fonts needed to convert Marvel books to whatever digital proprietary format they use, too, so the fonts would look how they should. Not to mention the recap pages, which are specific to the digital edition due to how the indicia is formatted.
It’s not the fans, retailers, press, or ComiXology, I’m pretty certain that at least three of those are 100% correct. Let’s go back to Daredevil 9′s busted page. It is impossible for a scanner to make that mistake. There’s no way for it to happen. There are four different page elements that have been transposed onto another page, leaving the next page silent save for its hand drawn sound effects. If the text is a single layer on its own, though… I could see how that could happen.
Who it is:
Marvel?: That means that either the scanners have access to Marvel’s pre-press files, which is amazing, or someone who works closely with Marvel, whether on the production side or at the printer, is slipping a scanner PDFs. There’s no way that the lettering could be transposed by a third party, and the font issue suggests that it’s someone fairly high up on the supply chain.
One last smoking gun. The Ultimate X-Men 7 digital scan includes a page advertising the digital edition of the comic. There’s a big blank space where the redemption code should go. I don’t have a digital comic version of this to check, but I’m willing to bet cash money that no digital comic version of this issue would include an access code for the digital comic. This is from a print comic.
More proof: the indicia in the digital scans include the print indicia, included the date the comic was manufactured. In the case of Ultimate X-Men 7, it was between 01/20/2012 and 01/31/2012 by Quad/Graphics Jonesboro, in Jonesboro, AK. Every printed comic has this info in it. The digital ones have a seriously abridged version of the print indicia, and no info on the printer.
Messed up fonts, print indicia, missing digital comics redemption codes, the fact that Avengers 22 is available as a digital scan despite not being available on ComiXology (or on Marvel’s stupidly exclusive app), the standard DPI, the rigid resolution, the perfect scans… it’s obvious what this is. Someone’s got Marvel’s print-ready files before they’re finalized, and they’re slapping them up online as digital scans. Clever girl.
Marvel: your ship is leaking, whether it’s internal, an FTP hack, or on the way to the printer.
Edit: Thanks to Uzumeri’s dogged determination, we figured out what the hack is this morning. It isn’t a person, it’s a security leak, and we emailed Marvel about it.
one more edit: Marvel closed the hole we found, though I don’t think it’ll lessen how often or easily Marvel’s books can be pirated, except in a few very specific instances. This was one hole that was very easy to exploit. There are others that are completely unavoidable.