Timothy W. Cole
Professor and Mathematics Digital Content Access Librarian
University of Illinois at Urbana-Champaign
Harriett Green
English and Digital Humanities Librarian
University of Illinois at Urbana-Champaign
The HathiTrust Digital Library is an archive of more than 10 million volumes. The digital book and serial surrogates in the HathiTrust were digitized from the collective print holdings of more than 80 major research libraries. The HathiTrust corpus offers scholars a unique opportunity to interact with nearly the whole body of published research literature housed in academic libraries in ways and at a scale not possible before. To tap into this potential and maximize the usefulness of the corpus for research, scholars must be provided with ways to define, identify and select the specific slice of the collection (i.e., the workset for analysis) most relevant to their research investigations.
This project briefing will introduce the Workset Creation for Scholarly Analysis (WCSA) project, a new initiative of the HathiTrust Research Center (HTRC) undertaken with the support of The Andrew W. Mellon Foundation. The goal of the WCSA project is to prototype and demonstrate new tools that will allow scholars to create a broad range of useful worksets of varying sizes and complexities, from a handful of volumes pertaining to a narrow scholarly interest, to worksets of tens or even hundreds of thousands of volumes, to more granular worksets comprised of components extracted from volumes, e.g., images relevant to a particular scholarly inquiry. To help achieve this goal, the HTRC will make four sub-awards in spring 2014 to research groups responding to our Request For Proposals (RFP). Successful respondents will describe a relevant scholarly use case requiring a workset creation capability not currently available and propose a prototype experiment to help address this gap.
The briefing will describe the RFP, discuss the limitations of existing tools and approaches, detail results from ongoing focus group studies that have informed development of the RFP, and highlight concurrent complementary work underway at HTRC to develop interoperable models of worksets and best practices to support citation and persistence of worksets over time. The session will also discuss the preliminary findings from the user requirements study conducted for the WCSA project, which gathered qualitative data on scholarly practices with text corpora through a series of focus groups and interviews with researchers who utilize large-scale, digitized text corpora.
http://www.hathitrust.org/htrc
http://htrc2.pti.indiana.edu/