Jefferson Bailey
Director, Archiving & Data Services
Internet Archive
Nick Ruest
Associate Librarian, Digital Scholarship Infrastructure Department
York University
Abigail Potter
Senior Innovation Specialist
Library of Congress
Meghan Ferriter
Senior Innovation Specialist
Library of Congress
Every year more and more scholars conduct research on terabytes and even petabytes of digital library and archive collections using computational methods such as data mining, natural language processing, and machine learning (ML), which poses many challenges for supporting research libraries. In 2020, Internet Archive Research Services and Archives Unleashed received funding to combine their tools enabling computational analysis of web and digital archives to support joint technology development, community building, and selected research projects by sponsored cohort teams. The session will feature programs that are building technologies, resources, and communities to support data-driven research, and it will review the beta platform, Archives Research Compute Hub, and discuss working with digital humanities, social and computer science researchers, and industry partners in support of large-scale digital research methods.
Concurrently, LC Labs are investigating computational research service models and infrastructure requirements for cloud-based access to data packages with Computing Cultural Heritage in the Cloud (CCHC), supported by the Mellon Foundation. When large digital collections are processed and analyzed, ML and other automated methods are often utilized. LC Labs have summarized four years of applied research into the applications of ML in library and archival contexts and developed a proposed framework to analyze the risks, benefits, and performance of artificial intelligence (AI) and ML with cultural and historic collections.
By considering and documenting the implications of AI and ML methods at the dataset, model, task, system, organizational, or sector level and developing standards of quality and shared technical frameworks for using AI/ML in libraries, archives, and museums, large-scale computational research can be transparent, practical, responsible, and coherent.
https://archivesunleashed.org/arch/
https://webservices.archive.org/pages/arch
https://labs.loc.gov/work/experiments/?st=gallery