thesis

Search algorithms on structured and unstructured data in a large database

Abstract

This project is concerned with the development of a search algorithm for a large archival database. The Port Elizabeth Genealogical Information System (PEGIS) contains a database consisting of almost 600000 individuals. The standard search algorithms are no longer sufficient to locate individuals in the database. A new algorithm was required that allows searches on any of the words or dates in the database, as well as a means to specify where in the desired record a word should occur. A ranking function of retrieved records was also required. A literature study on the field of Information Retrieval and on algorithms designed specifically for the PEGIS was done. These algorithms were adapted and hybridized to yield a search algorithm that allows for the boolean formulation of queries and the specification of the structure of search words in the desired records. The algorithm ranks retrieved records in assumed relevance to the user. The new algorithms were evaluated with regards to retrieval speed and accuracy and were found to be very effective

    Similar works