Anthony Grafton, discussing Future Reading in the New Yorker, describes efforts by Google and Microsoft to digitize the world’s collection of books with a mixture of approval and horror. Grafton loves knowledge and his approval comes from the realization that the Internet can reach an audience even greater than his beloved New York Public Library, which “admitted everyone … not only presentable young scholars … but also many wild figures who haunted the reading rooms.” But he regards with horror the idea that digitized books can ever replace “real” ones. For quality research Grafton believe one still had to return to books and paper. Online research might work for amateurs, but serious scholars and professionals need books.
Sit in your local coffee shop, and your laptop can tell you a lot. If you want deeper, more local knowledge, you will have to take the narrower path that leads between the lions and up the stairs. [a reference to the lion statues at the New York Public Library] … Duguid describes watching a fellow-historian systematically sniff two-hundred-and-fifty-year-old letters in an archive. By detecting the smell of vinegar-which had been sprinkled, in the eighteenth century, on letters from towns struck by cholera, in the hope of disinfecting them-he could trace the history of disease outbreaks. Historians of the book-a new and growing tribe-read books as scouts read trails.
Grafton’s devotion to books — to the continuation of the “old and reassuring story: bookish boy or girl enters the cool, dark library and discovers loneliness and freedom” — obscures the basic fact that today and for all time to come, online information will dominate books. The numbers are unequivocal. The Library of Congress, which is larger than the New York Public Libary, contains about 11 terabytes of information. That’s a huge amount of information. Yet it is dwarfed by the amount of information already accessible online through search engines, about 167 terabytes. This is about fifteen times as much as the Library of Congress, a figure which even Grafton admits is impressive. But the information available through search engines like Google in turn shrinks to a literal dot compared to the material for which no ready directory exists: the so-called Deep Web. Deep Web is that part of the Internet for which there is no street map. The University of California in Berkeley estimates the Deep Web to be 91,000 terabytes in size — 545 times larger than all the material indexed by search engines and 8,150 times larger than the holdings of the Library of Congress. The difference between paper and online holdings is the difference between a small chicken and a fully grown Tyrannosaurus rex. And if Google, Microsoft and others ever finish their plan to migrate books online it will simply mean that the T-Rex has eaten the chicken. Grafton describes the digital migration efforts that are already underway.
Google and Microsoft are flanked by other big efforts. Some are largely philanthropic, like the old standby Project Gutenberg, which provides hand-keyboarded texts of English and American classics, and the distinctive Million Book Project, founded by Raj Reddy, at Carnegie Mellon University. Reddy works with partners around the world to provide, among other things, online texts in many languages for which character-recognition software is not yet available. There are hundreds of smaller efforts in specialized fields-like Perseus, a site, based at Tufts, specializing in Greek and Latin-and new commercial enterprises like Alexander Street Press, which offers libraries beautifully produced collections of everything from Harper’s Weekly to the letters and diaries of American immigrants. It has become impossible for ordinary scholars to keep abreast of what’s available in this age of electronic abundance-though D-Lib Magazine, an online publication, helps by highlighting new digital sources and collections, rather as material libraries used to advertise their acquisition of a writer’s papers or a collection of books with fine bindings.
Books are great, but digital storage is the wave of the future. Yet we cannot see the wave in its entirety. We don’t know where most of that avalanche of knolwedge is and how to easily find it. Most information on the Web is locked up in databases and cannot be “spidered,” a term used to describe the software indexing of Internet material. For example, web pages generated from databases only “exist” when a query is run, like online telephone directories which do not have a separate page for every person in the directory and only create a page in response to a request. Database generated pages have a transient existence and cannot easily be indexed. Password protected websites like locked apartments or private telephone numbers defy our attempts to see within them. Much information lives on the Deep Web. It is there but we cannot see it without taking special steps.
The immense size of the unindexed Internet has motivated consultants and online resources to offer help at finding information in the Deep Web the way traditional librarians guided scholars through the stacks in days gone by.
Modern librarians can spend as much time helping readers with online searches as they do with finding books or paper documents. But despite their best efforts researchers can never be certain whether something they are looking for has been missed. There is no one map which can show where everthing is. The explosive growth of online information may in fact outstrip our ability to catalogue it. The cost of indexing means the picture of what we know will always be out of date or incomplete. Often it will be both. Like some vast terra incognita, the undiscovered country of human knowledge expands constantly, defying even attempts to survey it.
Anthony Grafton described the amazement which author Alfred Kazin felt on entering the New York Public Library. It was a 1938 Aladdin’s Cave which contained “anything I had heard of and wanted to see.” Its books and publications let Kazin ramble through “lonely small towns, prairie villages, isolated colleges, dusty law offices, national magazines, and provincial ‘academies’.” But like any ramble it would be a hit or miss affair, the things that it encountered owing something to chance. Despite today’s technical advances, 21st century researchers are in principle no further ahead than Kazin. The lack of a roadmap means we may often miss what we are looking for, and just as frequently find something even better by accident. Knowledge has always defied efforts at easy unification. And if Google’s digitizing efforts don’t produce what Grafton hoped would be a “universal library” and instead become “a patchwork of interfaces and databases,” there is no help for it. We too will have to ramble. But the sheer growth of online information will definitely mean that the future of reading will refer to reading online.
Through the unknown, unremembered gate
When the last of earth left to discover
Is that which was the beginning;
At the source of the longest river
The voice of the hidden waterfall
And the children in the apple-tree
Not known, because not looked for
– “Little Gidding”, T.S.Eliot
Richard Fernandez is PJM Sydney editor; he also writes at the Belmont Club.