Scanning Old Paperbacks

I’m a book addict. I love books, I love people who write books, I love people who publish books, I love other people who love books.

My wife and I own a lot of books. (5690 at current count, all cataloged in Delicious Library. And, yes, the shelves are categorized and alphabetized.)
IMG_9415

But… books are bulky. And heavy. And if you’re away from home, you’re limited to only having one title with you. (Except in extreme situations, like vacations. It used to be that, to go on a seven-day cruise, I’d have to bring fourteen paperbacks. That’s a small suitcase right there!)

About three years ago, the mental switch flipped in my head, and I realized that — except for art, photography, maps, or other specialty graphics-intense publications — I’d rather have the electronic version.

I blogged about ebooks five years ago, but new hardware has intensified my preference. I know I said I didn’t like the Kindle in my previous post, but I was wrong. The battery life on e-ink readers is long enough that I don’t have to pack a charger, and I love the light weight compared to my iPad.

I bought a couple of generations of Kindle, but have switched to the Nook Simple Touch because the Kindle cut too many corners with its hardware design. (I’m willing to switch again if someone builds a better reader.) My wife loves reading on her iPad mini, but I can’t read my iPad outdoors, where e-ink excels. And, besides, I spend too much of my life staring at glowing rectangles already. I currently have about 1000 ebooks, and seem to buy at least a couple per week. I keep everything in Calibre (iTunes for ebooks). And a 16 GB MicroSD card in my Nook means I can have all of them with me, all the time.

But… there are all those books that I already own. What I want is something like iTunes Match for Books… once I prove that I own a physical copy, then an electronic copy just shows up. For lots of legal reasons, that’s probably not going to happen. So… scanning?

I own, um, six scanners (don’t judge; it’s a sickness), not counting the high-volume Ricoh scanner at the office. I have a lot of experience with scanning everything from slides to photographs to long multi-page documents. Sorry, my time is worth more than it would take to scan and tweak an entire book. So I’ve been looking at the various scanning services out there. I got very intrigued by the reviews of 1dollarscan.com, and decided to try them.

Over the years, I’ve accumulated duplicate copies of some books, so I decided to sacrifice a few paperbacks (the 1dollarscan process destroys the books; you don’t get them back) as a trial. I set up an account, navigated the website, and quickly realized that the “1 dollar scan” is just a marketing come-on. It’s $1 for each 100 pages… so most books are going to be $3 or more. And running OCR software to convert the scans to text costs about the same. So now you’re at $6/book… plus you have to ship them to California. Even parcel post, that’s going to be a couple of bucks. But I guess “8dollarscan.com” doesn’t sound as compelling — at that price, you’re probably better off just buying an electronic copy for $10, and donating the physical copies to Better World Books.

Except for out-of-print titles that will never be available as ebooks. Which is a big part of my collection. So I shipped three paperbacks off to 1dollarscan on July 18th as a test. I didn’t pay extra for rush service or anything, but I got an email this evening (two weeks later) that my files were available for download. Yay!

First, as the Wired review warned me, they’re friggin’ huge. Each file weighed in at over 150 megabytes (and took forever to download, even with my high-speed connection… I suspect the company needs to buy some bandwidth on their end).

Quality? The images were probably as good as can be expected. Here’s a sample of the original huge scan (click to embiggen):

original scan large

That’s pretty much what the real pages look like… yellowed paper, and never the highest quality in the first place. These were mass-market paperbacks, not the Book of Kells. But, hey, I know how to use the Export filters in Mac OS X Preview (and you should, too):

Export dialog

After running “Black and White” and “Reduce File Size” filters, the file size dropped from 155 MB to 19.6 MB… a factor of eight. And the text remains legible and searchable. This is a nice archival destination for the book, I think.

B&W searchable large

But, like all PDFs, it looks lousy on the Nook/Kindle. Once you download the ginormous file, you also get an option to “fine tune” the output PDF for your device (which means another trip to the website, and another download… mercifully, a much smaller one this time). Here’s the Nook version:

nook optimized

This was a mistake. The file had more visual artifacts than just running it through the B&W filter on my Mac, it lost the searchable-text, and the bottom line of every page was cropped in half. (You can see half of the page number “7″ in the embiggened version; that’s really irritating when it’s a full line of text.) And the file was the same 19.6 MB. So… this isn’t the right way to get a version of the book that you’d be willing to read on your e-ink device.

Well, what about the OCR? Marginal. Extracting the 531 kB of text from my book is a simple cut-and-paste on the Mac, and making that into an EPUB is just a few more clicks in Calibre. But here’s the first paragraph of Hogan’s editorial as delivered in the 1dollarscan.com searchable PDF:

It’s difficult to believe, but twenty years have qone by since that July 1969 when the ft*t-footptit t maitd the surface of the Moon-twenty years sin-ce the heady {ay-s- of a decade rvhen Ameica took up a poliucdl cballenge on behalf of its President.

Yeah, you can get the gist of it, but it’s really not acceptable for pleasure reading.

My books were around 280 pages each, so I wound up paying $19.80 to 1dollarscan.com. I overpaid for shipment, since UPS picks up in my building and I’m lazy, but they would have easily fit into a Priority Mail Small Flat-Rate Box ($5.80). That’s $8.53/book. Plus I had to invest a bit of time to manually name, filter, and convert each received PDF. 1dollarscan offers a high-quality OCR option at an additional $2 per 100 pages. Next time, I’ll try that… but at that point, you’re up over $15/book, and the economics begin to look a bit silly.

Conclusion? I really really wanted to like this service. I wanted to ship them a few hundred old paperbacks and get a DVD back with digital copies of all of them. And, if you just want archival copies with searchable text to use on your laptop or desktop, I think it’s a good solution, albeit still more expensive than I’d like.

If you want copies that you can read on your Kindle or Nook… I can’t recommend it. I wish I could.