4 research outputs found
Signature file access methodologies for text retrieval: a literature review with additional test cases
Signature files are extremely compressed versions of text files which can be used as access or index files to facilitate searching documents for text strings. These access files, or signatures, are generated by storing hashed codes for individual words. Given the possible generation of similar codes in the hashing or storing process, the primary concern in researching signature files is to determine the accuracy of retrieving information. Inaccuracy is always represented by the false signaling of the presence of a text string. Two suggested ways to alter false drop rates are: 1) to determine if either of the two methologies for storing hashed codes, by superimposing them or by concatenating them, is more efficient; and 2) to determine if a particular hashing algorithm has any impact. To assess these issues, the history of suprimposed coding is traced from its development as a tool for compressing information onto punched cards in the 1950s to its incorporation into proposed signature file methodologies in the mid-1980\u27 s. Likewise, the concept of compressing individual words by various algorithms, or by hashing them is traced through the research literature. Following this literature review, benchmark trials are performed using both superimposed and concatenated methodologies while varying hashing algorithms. It is determined that while one combination of hashing algorithm and storage methodology is better, all signature file mehods can be considered viable
Recommended from our members
An investigation to study the feasibility of on-line bibliographic information retrieval system using an APP
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.This thesis reports an investigation on the feasibility study of a
searching mechanism using an APP suitable for an on-line bibliographic
retrieval, operation, especially for retrospective searches.
From the study of the searching methods used in the conventional
systems it is seen that elaborate file- and data- structures are
introduced to improve the response time of the system. These
consequently lead to software and hardware redundancies. To mask
these complexities of the system an expensive computer with higher
capabilities and more powerful instruction set is commonly used.
Thus the service of the systen becomes cost-ineffective.
On the other hand the primitive operations of a searching mechanism,
such as, association, domain selection, intersection and unions, are
the intrinsic features of an associative parallel processor. Therefore
it is important to establish the feasibility of an APP as a cost-effective
searching mechanise.
In this thesis a searching mechanism using an 'ON-THE-FLY' searching
technique has been proposed. The parallel search unit uses a Byte-oriented
VRL-APP for efficient character string processing.
At the time of undertaking this work the specification for neither the
retrieval systems nor the BO-VRL APP's were well established; hence a
two-phase investigation was originated. In the Phase I of the work a
bottom up approach was adopted to derive a formal and precise
specification for the BO-VRL-APP. During the Phase II of the work
a top-down approach was opted for the implementation of the searching
mechanism.
An experimental research vehicle has been developed to establish
the feasibility of an APP as a cost-effective searching mechanism.
Although rigorous proof of the feasibility has not been obtained,
the thesis establishes that the APP is well suited for on-line
bibligraphic information retrieval operations where substring searches
including boolean selection and threshold weights are efficiently
supported
Recommended from our members
ICL Technical Journal 4(4): CAFS-ISP
The special issue of the ICL Technical Journal on CAFS-ISP. This closely followed the award to ICL of the Queen's Award for Technology in April, 1985. The contents include the history of the hardware and software, its status and future, perspectives from leading developers and users, and a list of related patents
Economic data bank management in a developing nation
This dissertation describes the results of a research project which was
undertaken at Loughborough University of Technology. The basic objectives of the research project were: (1) to investigate the management elements required for organising the
development of an Economic Data Bank (EDB), with particular emphasis
on the requirements of a developing nation; (2) to investigate the sociological, political and technical implications
associated with organising the development of an EDB in a developing
nation.
A theoretical framework was established for this study. This was dene
after an extensive search and review of literature was performed in the
areas of data and data base management systems, management information
systems, and computer technology in general. [Continues.