[extropy-chat] Science & Consciousness Review web site

Jef Allbright jef at jefallbright.net
Thu Dec 8 21:15:32 UTC 2005


On 12/8/05, Dirk Bruere <dirk.bruere at gmail.com> wrote:
>
>
> On 12/8/05, Herb Martin <HerbM at learnquick.com> wrote:
> >
> > BTW, how do you digitize you library -- a guess would make your
> > "entire library" rather large....?  (I have a scanner but cannot
> > seriously imagine actually scanning all of my books; actually it
> > would be tedious to do even one.)
> >
> >
>
>  Photograph each page and run them through an OCR as a batch job.
>  That has to be the quickest method.

I've digitized over 500 books in my library, starting about three
years ago.  Main motivator was that with a few full bookcases and a
few hundred pounds of books in boxes since my last change of
residence, they were a chore to move and manage and I didn't have good
visibility or access when I wanted one of them.

I process the books by first cutting off the binding.  I took the
first hundred or so to Kinko's and paid a dollar each for them to cut
the bindings off, but I found that they often cut too close to the
binding, leaving some pages glued together.  I learned that I can
remove the bindings easily myself, using two slices with an Exacto
knife to remove the front and back cover, and then carefully pulling
loose about 30 pages at a time by hand.  I then trim the sheets, about
20 at a time, using a paper slicer.

Next step is to feed through a *good* scanner with sheet feeder
attachment.  I've used a few models and they keep getting better.  
With my current setup, I can strip the binding, trim the sheets and
scan double-sided to PDF at 300*300 resolution a 300-page book in
about 30-40 minutes.

I then inspect the scanned pages, rescan any that need improved
grayscale or color, and make sure no pages are missing (from sticking
and going through the scanner together.)

Next I run the PDF through Omnipage (currently V15) and come back in
an hour or so to save the OCRd file, usually as PDF image overlaying
text, and also as an RTF file to be moved onto my PDA.  The OCR is so
good these days that I almost never bother with any correcting.

>From that point, I do most of my reading from the OCR'd PDF on my
notebook PC.  From there, I can highlight, add comments and search for
text accross my entire library.  The PDA copy is most useful for
fiction (no commenting or highlighting usually) and for reading while
waiting in line or traveling without my notebook.

Total time investment is about an hour for a standard 300-400 page
book, creating a PDF file of typically 20MB and an RTF file of about
500kB to 1MB.  I buy a lot of books from Amazon and partners, and
usually convert them the same day as received (throwing away the dead
tree corpse) so that I can read with all the electronic benefits. 
I've tried buying e-books, but after a half-dozen experiments have
decided that the copy protection interferes too much with my intended
usage, and it's not worth the effort necessary to work around the
protection for each one.

Lizbeth and I both read in bed most nights, with our self
illuminating, one-handed holding and easy scrolling books on PDA, or
when I'm studying, on the notebook PC.

A little further into the future, my intention is to have the entire
collection, including my annotations, indexed and cross-referenced
automatically and available to a personal agent to augment my
relatively weak and unreliable biological memory.

- Jef



More information about the extropy-chat mailing list