Scribe: A Clustering Approach To Semantic Information Retrieval

Langley, Joseph R

Scribe: A Clustering Approach To Semantic Information Retrieval

Authors: Joseph R Langley
Publication date: 5 August 2006
Publisher: Scholars Junction

Abstract

Information retrieval is the process of fulfilling a user?s need for information by locating items in a data collection that are similar to a complex query that is often posed in natural language. Latent Semantic Indexing (LSI) was the predominant technique employed at the National Institute of Standards and Technology?s Text Retrieval Conference for many years until limitations of its scalability to large data sets were discovered. This thesis describes SCRIBE, a modification of LSI with improved scalability. SCRIBE clusters its semantic index into discrete volumes described by high-dimensional extensions to computer graphics data structures. SCRIBE?s clustering strategy limits the number of items that must be searched and provides for sub-linear time complexity in the number of documents. Experimental results with a large, natural language document collection demonstrate that SCRIBE achieves retrieval accuracy similar to LSI but requires 1/10 the time

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Scholars Junction - Mississippi State University Institutional Repository

oai:scholarsjunction.msstate.e...

Last time updated on 03/12/2021