486 research outputs found

    m-tables: Representing Missing Data

    Get PDF
    Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible

    A fractional number based labeling scheme for dynamic XML updating

    Get PDF
    Recently, XML query processing based on labeling schemes has been proposed.Based on labeling schemes, the structural relationship between XML nodes can be determined quickly without the need of accessing the XML document.However, labeling schemes have to re label the pre-existing nodes or re-calculate the label values when a new node is inserted into the XML document during the update process.In this paper, we propose a novel labeling scheme based on fractional numbers.The key feature of fractional numbers is that infinite number of fractional numbers can be inserted between any two unequal fractional numbers.Therefore, the problem of re-labeling the pre-existing nodes during the XML updating can be solved if the XML nodes are label by the fractional numbers

    Knowledge Refinement via Rule Selection

    Full text link
    In several different applications, including data transformation and entity resolution, rules are used to capture aspects of knowledge about the application at hand. Often, a large set of such rules is generated automatically or semi-automatically, and the challenge is to refine the encapsulated knowledge by selecting a subset of rules based on the expected operational behavior of the rules on available data. In this paper, we carry out a systematic complexity-theoretic investigation of the following rule selection problem: given a set of rules specified by Horn formulas, and a pair of an input database and an output database, find a subset of the rules that minimizes the total error, that is, the number of false positive and false negative errors arising from the selected rules. We first establish computational hardness results for the decision problems underlying this minimization problem, as well as upper and lower bounds for its approximability. We then investigate a bi-objective optimization version of the rule selection problem in which both the total error and the size of the selected rules are taken into account. We show that testing for membership in the Pareto front of this bi-objective optimization problem is DP-complete. Finally, we show that a similar DP-completeness result holds for a bi-level optimization version of the rule selection problem, where one minimizes first the total error and then the size

    Optimizing Spatial Databases

    Get PDF
    This paper describes the best way to improve the optimization of spatial databases: through spatial indexes. The most commune and utilized spatial indexes are R-tree and Quadtree and they are presented, analyzed and compared in this paper. Also there are given a few examples of queries that run in Oracle Spatial and are being supported by an R-tree spatial index. Spatial databases offer special features that can be very helpful when needing to represent such data. But in terms of storage and time costs, spatial data can require a lot of resources. This is why optimizing the database is one of the most important aspects when working with large volumes of data.Spatial Database, Spatial Index, R-tree, Quadtree, Optimization
    corecore