William Y. Arms
Professor of Computer Science
Cornell University
As digital libraries become ubiquitous, scholarship is changing. We can foresee a day when humanities and social science scholars carry out library research with computer programs that act as their agents. This briefing describes the implementation of the Cornell Web Library, which has been designed for this new style of social science research. The library is currently loading some 10 billion Web pages, about 240 Terabytes. They are drawn from the historical collections of the Internet Archive. The briefing will discuss the tools needed to carry out research on such a large corpus; it will also discuss the capabilities and limitations of today’s high performance computing for very large-scale digital libraries.
Web Site:
http://www.infosci.cornell.edu/SIN/WebLib/
Handout (MS Word)