Kenning Arlitsch Associate Dean for IT Services University of Utah |
Patrick O’Brien Search Engine Optimization Manager University of Utah |
Google Scholar (GS) has difficulty indexing the contents of institutional repositories (IRs) because most IRs use Dublin Core metadata, which cannot express bibliographic citation information adequately for academic papers. GS’s Webmaster Inclusion Guidelines site cautions to “use Dublin Core only as a last resort,” and recommends other metadata schemas instead. It also recommends specific guidelines to facilitate crawlers, including writing metadata from the repository database to HTML headers. Surveys of institutional and disciplinary repositories across the United States were conducted and the inquiries revealed indexing ratios to support the hypothesis that IRs that do not follow these metadata and crawl guidelines suffer from a low indexing ratio. Survey results also demonstrate that the low indexing ratio problem cuts across institutions and repository software. Three pilot projects were conducted that transformed the metadata of a subset of papers from USpace, the University of Utah’s institutional repository, and examined the results of Google Scholar’s harvest. The pilot projects were successful, achieving a 90% indexing ratio.
This presentation will cover the highlights of a paper that is being published in March in Library Hi Tech. The broader research initiative emphasizes search engine optimization for all digital repositories, including general digital library collections, and has recently been funded by a 3-year National Leadership Grant from the Institute of Museum and Library Services.