Joel Herndon
Head, Data and GIS Services
Duke University
Molly Tamarkin
Associate University Librarian for Information Technology
Duke University
Though research libraries face an increasing demand for collections and services that facilitate text mining, most digital text and e-journal collections are licensed for use and hosted by vendors in such a way as to prevent data mining. However, a few publishers have provided hard drives to represent “backup” copies of these licensed databases. Unsure what to do with the increasing collection of hard drives, and realizing that copies of this data could be easily obtained should the “backup” fail, Duke University Library decided to create a text mining collection within its Center for Data & GIS Services. Researchers at Duke can now access large volume text collections, either by using a lab designed for big data research, or on their own machines, via a system that provides working copies of large-scale text collections. Furthermore, the library has launched a series of workshops focused on research strategies surrounding text mining featuring a wide range of topics from managing text data structures to latent Dirichlet allocation. This presentation will describe the new services and data analytic methodologies while exploring continuing issues in text mining from licensing to access to research support.