web site optimization


Website performance news and views™

Home Sitemap Issues About Contact
Free newsletter
buy website optimization secrets book cover

New book on Website Optimization Secrets

Free Newsletter
Enter email:

home > reviews > google-digitization

John Wilkin on the U-M - Google Digitization Project

We heard John Wilkin, Associate Librarian for Library IT and TAS at the University of Michigan Library, discuss the partnership between UM and Google on Monday, April 3 at 7 pm in the Ann Arbor, Michigan public library. This talk is part of the 2006 Library Director's Program of National Library Week. A topic of current interest is chosen each year by AADL Director Josie Parker to highlight during this weeklong celebration of libraries.

Mr. Wilkin spoke about his involvement in the massive digitization project that Google, U-M, Harvard, and other universities are undertaking. Google is sponsoring U-M to digitize seven million bound volumes to make them available online both at books.google.com and the University of Michigan. The project started in July 2005, and John estimates it will be completed within six years (from the date of his talk).

Before Google stepped in, the U-M was digitizing books at a rate of 5000 to 8000 per year, receiving 100,000 new volumes per year. At that rate it would take over 1000 years to digitize the 7 million volumes in U-M libraries. The University needed someone like Google to speed up the process. John is under a non-disclosure agreement with Google, but he did say that they've seen several doublings in the number of volumes scanned, and within six months they'll be more than six times faster at scanning.

Google developed their own scanning technology, to non-destructively scan bound volumes by turning pages and rapidly scanning and OCRing the pages. In an annual machine translation contest, Google trounced the competition, especially for Arabic and Slavic languages. The challenge in OCR is with the non-roman alphabets, like Arabic, Chinese, Japanese, and Korean. Translations are getting better with time and volume as the same volumes in different languages get translated and correlated.

While publishers (including my own) worry that online versions of books will decrease sales, there is some anectodal proof that bringing print material online increases sales. Amazon reported that sales for books that you can "search inside" have increased by 9%. Copyright is respected, with images available as fair use snippets with the text searchable. Google is talking about selling material online, although no definite plans were announced.

Google's ultimate goal is to provide a kind of online Alexandria, with the world's information available at our fingertips. Digitizing millions of volumes of published information is one component of this strategy.

Further Reading