Signature file access methodologies for text retrieval: a literature review with additional test cases

Caviglia, Karen

Signature file access methodologies for text retrieval: a literature review with additional test cases

Authors: Karen Caviglia
Publication date: 1 January 1987
Publisher: RIT Scholar Works

Abstract

Signature files are extremely compressed versions of text files which can be used as access or index files to facilitate searching documents for text strings. These access files, or signatures, are generated by storing hashed codes for individual words. Given the possible generation of similar codes in the hashing or storing process, the primary concern in researching signature files is to determine the accuracy of retrieving information. Inaccuracy is always represented by the false signaling of the presence of a text string. Two suggested ways to alter false drop rates are: 1) to determine if either of the two methologies for storing hashed codes, by superimposing them or by concatenating them, is more efficient; and 2) to determine if a particular hashing algorithm has any impact. To assess these issues, the history of suprimposed coding is traced from its development as a tool for compressing information onto punched cards in the 1950s to its incorporation into proposed signature file methodologies in the mid-1980\u27 s. Likewise, the concept of compressing individual words by various algorithms, or by hashing them is traced through the research literature. Following this literature review, benchmark trials are performed using both superimposed and concatenated methodologies while varying hashing algorithms. It is determined that while one combination of hashing algorithm and storage methodology is better, all signature file mehods can be considered viable

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

RIT Scholar Works

oai:repository.rit.edu:theses-...

Last time updated on 12/01/2024

Name not available

oai:scholarworks.rit.edu:these...

Last time updated on 06/01/2018