Vessela Ensberg
Associate Director of Product & Strategy
University of California, Davis
The session includes an overview of the University of California, Davis Library’s assessment framework for instructBLIP and Gemini; its key purpose is to call out the silos in which experimentation is conducted and to attempt to remedy the issue. New models and tools are published with undiminished frequency since ChatGPT first became public, and libraries would benefit from a ready setup that gives a baseline assessment of what a new tool can do. To create a setup that broadly enables an AI-savvy library community, libraries need to define scoring categories and have a prepared dataset, metrics that reflect clear understanding of what is an acceptable output, and an available team to conduct testing. A month after a new product launch, the entire library community should have access to statistics of the model’s zero-shot performance on relevant library functions: captioning images from defined periods, extracting key points from works in selected fields, and transcribing speech from multiple languages.