Martin Klein
Scientist
Los Alamos National Laboratory
Herbert Van de Sompel
Chief Innovation Officer
Data Archiving and Networked Services
Increasingly, scholars across disciplines and throughout the research life cycle are using a wide variety of online portals such as GitHub, FigShare, Publons, and SlideShare to conduct aspects of their research and to communicate research outcomes. However, these portals, whether dedicated to scholarly use or general purpose, exist outside of the traditional scholarly publishing system and no infrastructure exists to systematically and comprehensively archive the deposited artifacts. We have shown in previous work that without adequate infrastructure, scholarly artifacts will vanish from the web in much the same way and with similar frequency regular web resources do.
In the “Scholarly Orphans” project, we assume that research institutions are interested in collecting scholarly artifacts created by their researchers. As such, we devised an institutional pipeline to track, capture, and archive these artifacts. The tracking part is crucial as institutions are usually not even aware of the existence of artifacts created by their researchers in online portals. Regarding capture, our newly developed Memento Tracer framework [1] plays a crucial role in creating high-fidelity Mementos of artifacts. With Memento Tracer, a human curator interacts with a web-based artifact to establish its essential components, and to record these interactions as Traces. A Trace can be used to guide the automatic capture of artifacts of the same class. And Traces can be shared with a community of practice. These characteristics give Memento Tracer the potential to bring about significant progress for high-quality web archiving at scale.
In this talk, we will demonstrate the pipeline [2] and share insights gained by developing and operating it. We will also share initial statistics regarding artifacts deposited in web portals by a group of volunteer researchers, and captured by our pipeline. We hope to spark a discussion with the CNI audience about the desirability, feasibility, and architecture of institutional processes aimed at capturing scholarly orphans.
[1] http://tracer.mementoweb.org/
[2] https://myresearch.institute/