36 research outputs found

    Extending the relational model with uncertainty and ignorance

    Get PDF
    It has been widely recognized that in many real-life database applications there is growing demand to model uncertainty and ignorance. However the relational model does not provide this possibility. Through the years a number of efforts has been devoted to the capture of uncertainty and ignorance in databases. Most of these efforts attempted to capture uncertainty using the classic probability theory. As a consequence, the limitations of probability theory are inherited by these approaches, such as the problem of information loss. In this paper, we extend the relational model with uncertainty and ignorance without these limitations posed by the other approaches. Our approach is based on the so-called theory of belief functions, which may be considered as a generalization of probability theory. Belief functions have an attractive mathematical\ud underpinning and many intuitively appealing properties

    Utilizing Structural Knowledge for Information Retrieval in XML Databases

    Get PDF
    In this paper we address the problem of immediate translation of eXtensible Mark-up Language (XML) information retrieval (IR) queries to relational database expressions and stress the benefits of using an intermediate XML-specific algebra over relational algebra. We show how adding an XML-specific algebra at the logical level of a DBMS enables a level of abstraction from both query languages for information retrieval in XML and the underlying physical storage and manipulation. We picked a region algebra as a basis for defining the structure aware (SA) view on XML in which we can distinguish among different XML entities, such as element nodes, text nodes, words, and determine their containment relation. Region algebras are already well established in semi-structured document processing as shown in an extensive overview of region algebra approaches in this paper. Furthermore, we propose a variant of region algebra that can support ranking operators in an elegant way while staying algebraic. As relevance scores are computed for regions in our region algebra we named it score region algebra (SRA). The benefits of introducing score region algebra are explained on a set of query examples. Besides abstracting from the query language used and the physical implementation, SRA enables a certain degree of abstraction from the retrieval model used and the opportunity to use the query optimization at the logical level of a database. Various retrieval models can be instantiated at the physical level based on the abstract specification of SRA operators. We also discuss numerous region algebra operator properties that provide a firm ground for query rewriting and optimization at the SA level, which is an important premise for the existence of such a logical view on XML

    Moa: extensibility and efficiency in querying nested data

    Get PDF
    Advanced non-traditional application domains such as geographic information systems and digital library systems demand advanced data management support. In an effort to cope with this demand, we present a novel multi-model DBMS architecture which provides efficient evaluation of queries on complexly structured data. A vital role in this architecture is played by the Moa language featuring a nested relational data model based on XNF2, in which we placed renewed interest. Furthermore, the architecture allows extensibility on all of its levels providing the means to better integrate domain-specific algorithms into the system. In addition to this, the extensibility of the Moa language is designed in a way that optimization obstacles due to blackbox treatment of ADTs is avoided. This combination of well-integrated domainspecific algorithms, extensibility open to optimization, and a mapping of queries on complexly structured data to an efficient physical algebra expression via a nested relational algebra, makes that the Moa system can efficiently handle complex queries from non-traditional application domains

    CIRQUID: complex information retrieval queries in a database

    Get PDF
    The CIRQUID project plans to design and build a DBMS that seemlessly integrates relevance-oriented querying of semi-structured data (XML) with traditional querying of this data. The project is funded by the Netherlands Organisation of Scientific Research

    A Selectivity Model for Fragmented Relations: Evaluated for different standard data distributions

    No full text
    In the estimation of selectivity, many models assume that data is uniformly distributed, which is not true for many applications. In this paper, we discuss a generalized selectivity model, the so-called l##-model which is independent of the data distribution. The model predicts the fraction of a relation that should be selected in order to process a query. We have evaluated this model for di#erent data distributions in order to determine the accuracy of this model. Data distributions that have been considered are the uniform distribution, the normal distribution, the exponential distribution, Pearson's distribution, and Zipf's distribution. From our experiments, it appears that the l##-model predicts the selectivity well, especially for the skewed distributions. Applying the l##-model on di#erent fragment sizes of a relation yields quite acceptable selectivity values as well

    Handling Uncertainty and Ignorance in Databases: A Rule to Combine Dependent Data

    Get PDF
    In many applications, uncertainty and ignorance go hand in hand. Therefore, to deliver database support for effective decision making, an integrated view of uncertainty and ignorance should be taken. So far, most of the efforts attempted to capture uncertainty and ignorance with probability theory. In this paper, we discuss the weakness to capture ignorance with probability theory, and propose an approach inspired by the Dempster-Shafer theory to capture uncertainty and ignorance. Then, we present a rule to combine dependent data that are represented in different relations. Such a rule is required to perform joins in a consistent way. We illustrate that our rule is able to solve the so-called problem of information loss, which was considered as an open problem so far
    corecore