4 research outputs found

    Knowledge-based document retrieval with application to TEXPROS

    Get PDF
    Document retrieval in an information system is most often accomplished through keyword search. The common technique behind keyword search is indexing. The major drawback of such a search technique is its lack of effectiveness and accuracy. It is very common in a typical keyword search over the Internet to identify hundreds or even thousands of records as the potentially desired records. However, often few of them are relevant to users\u27 interests. This dissertation presents knowledge-based document retrieval architecture with application to TEXPROS. The architecture is based on a dual document model that consists of a document type hierarchy and, a folder organization. Using the knowledge collected during document filing, the search space can be narrowed down significantly. Combining the classical text-based retrieval methods with the knowledge-based retrieval can improve tremendously both search efficiency and effectiveness. With the proposed predicate-based query language, users can more precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. To assist users formulate a query, a guided search is presented as part of an intelligent user interface. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users\u27 particular interests. A knowledge-based query processing and search engine is presented as the core component in this architecture. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query. Cache is introduced to speed up the process of query refinement. Theoretical proof and performance analysis are performed to prove the efficiency and effectiveness of this knowledge-based document retrieval approach

    Managing complex taxonomic data in an object-oriented database.

    Get PDF
    This thesis addresses the problem of multiple overlapping classifications in object-oriented databases through the example of plant taxonomy. These multiple overlapping classifications are independent simple classifications that share information (nodes and leaves), therefore overlap. Plant taxonomy was chosen as the motivational application domain because taxonomic classifications are especially complex and have changed over long periods of time, therefore overlap in a significant manner. This work extracts basic requirements for the support of multiple overlapping classifications in general, and in the context of plant taxonomy in particular. These requirements form the basis on which a prototype is defmed and built. The prototype, an extended object-oriented database, is extended from an object-oriented model based on ODMG through the provision of a relationship management mechanism. These relationships form the main feature used to build classifications. This emphasis on relationships allows the description of classifications orthogonal to the classified data (for reuse and integration of the mechanism with existing databases and for classification of non co-operating data), and allows an easier and more powerful management of semantic data (both within and without a classification). Additional mechanisms such as integrity constraints are investigated and implemented. Finally, the implementation of the prototype is presented and is evaluated, from the point of view of both usability and expressiveness (using plant taxonomy as an application), and its performance as a database system. This evaluation shows that the prototype meets the needs of taxonomists
    corecore