Maciej Ceglowski
Lead Developer
National Institute for Technology and Liberal Education
Clara Yu
Director
National Institute for Technology and Liberal Education/CET
John L. Cuadrado
Consultant
National Institute for Technology and Liberal Education
Much of the digital content becoming available online lacks meaningful metadata descriptors, but metadata creation is both time-consuming and expensive. Using latent semantic indexing (LSI) techniques, the National Institute for Technology and Liberal Education (NITLE) have developed a search and archiving tool that is able to make inferences about document similarity from patterns of word use across a collection. These similarity values, in turn, allow the tool to assign the documents to categories based on their content. This procedure is language-neutral and fully automatic. While the tool is able to make use of existing metadata, it also can sort and organize raw documents with a high degree of accuracy, across databases, in centralized or distributed mode.
http://www.nitle.org/lsi.php
Handout:
Managing Unstructured Data with Latent Semantic Indexing