37,484 research outputs found

    An extended relational document retrieval model

    Full text link
    Relational Data Base Management Systems offer a commercially available tool with which to build effective document retrieval systems. The full potential of the relational model for supporting the kind of ad hoc inquiry characteristic of document retrieval has only recently been explored. In addition, commercially available relational DBMS's also provide effective tools for managing document data bases by providing facilities for, inter alia, concurrency control, data migration and reorganization routines, authorization mechanisms, enforcement of integrity constraints, dynamic data definition, etc. This article will present a relational logical model to support a sophisticated document retrieval system in which flexible forms of inferential and associative searching can be performed. Examples of ad hoc inquiry will be presented in SQL. Several problems of particular importance to document retrieval will be discussed, including the importance of Conjunctive Normal Form in query formulation, unique aspects of document retrieval storage and processing overhead, and techniques for reducing the size of storage without severely impacting retrieval effectiveness.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/27543/1/0000587.pd

    A Database Approach to Content-based XML retrieval

    Get PDF
    This paper describes a rst prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is beneficial if the system is biased to retrieve large XML fragments over small fragments

    On Region Algebras, XML Databases, and Information Retrieval

    Get PDF
    This paper describes some new ideas on developing a logical algebra for databases that manage textual data and support information retrieval functionality. We describe a first prototype of such a system

    The Mirror DBMS at TREC-8

    Get PDF
    The database group at University of Twente participates in TREC8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very large corpus track in which we plan to participate next year. From an IR perspective, the experiments have been designed to learn more about the effect of the global statistics on the ranking

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    COGMIR: A Computer Model for Knowledge Integration.

    Get PDF
    Knowledge integration is an important topic for knowledge engineering. In this dissertation, we explore some aspects of knowledge integration, namely, accumulation of scientific knowledge and performing analogical reasoning on the acquired knowledge. Knowledge to be integrated is conveyed by paragraph-like pieces, these pieces will be referred to as documents. By incorporating some results from cognitive science, the Deutsch-Kraft model of information retrieval is extended to a model for knowledge engineering, which integrates acquired knowledge and performs intelligent retrieval. The resulting computer model is termed COGMIR, which stands for a COGnitive Model for Intelligent Retrieval. A scheme, named query invoked memory reorganization, is used in COGMIR for knowledge integration. Unlike some other schemes which realize knowledge integration through subjective understanding by representing new knowledge in terms of existing knowledge, the proposed scheme suggests at storage time only recording the possible connection of knowledge acquired from different documents. The actual binding of the knowledge acquired from different documents is deferred to query time, depending on the actual needs of the query. Therefore, although there is only one way to store knowledge, there are potentially numerous ways to utilize the knowledge. From the classical information retrieval viewpoint, we have extended the original model in the following sense, not only each document be represented as a whole, but also the meaning of each document can be represented. In addition, since facts are constructed from the documents, document retrieval and fact retrieval are treated in a unified way. Moreover, when the requested knowledge is not available, query invoked memory reorganization can generate suggestion based on available knowledge through analogical reasoning. This is done by revising the algorithms developed for document retrieval and fact retrieval, and by incorporating Gentner\u27s structure mapping theory. Analogical reasoning is treated as a natural extension of intelligent retrieval, so that two previously separate research areas are thus combined. A case study is provided to demonstrate the fundamental ideas. All the components are implemented as list structures, which bears an interesting similarity to relational data-bases
    • …
    corecore