486 research outputs found
m-tables: Representing Missing Data
Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible
A fractional number based labeling scheme for dynamic XML updating
Recently, XML query processing based on labeling schemes has been proposed.Based on labeling schemes, the structural relationship between XML nodes can be determined quickly without the need of accessing the XML document.However, labeling schemes have to re label the pre-existing nodes or re-calculate the label values when a new node is inserted into the XML document during the update process.In this paper, we propose a novel labeling scheme based on fractional numbers.The key feature of fractional numbers is that infinite number of fractional numbers can be inserted between any two unequal fractional numbers.Therefore, the problem of re-labeling the pre-existing nodes during the XML updating can be solved if the XML nodes are label by the fractional numbers
Knowledge Refinement via Rule Selection
In several different applications, including data transformation and entity
resolution, rules are used to capture aspects of knowledge about the
application at hand. Often, a large set of such rules is generated
automatically or semi-automatically, and the challenge is to refine the
encapsulated knowledge by selecting a subset of rules based on the expected
operational behavior of the rules on available data. In this paper, we carry
out a systematic complexity-theoretic investigation of the following rule
selection problem: given a set of rules specified by Horn formulas, and a pair
of an input database and an output database, find a subset of the rules that
minimizes the total error, that is, the number of false positive and false
negative errors arising from the selected rules. We first establish
computational hardness results for the decision problems underlying this
minimization problem, as well as upper and lower bounds for its
approximability. We then investigate a bi-objective optimization version of the
rule selection problem in which both the total error and the size of the
selected rules are taken into account. We show that testing for membership in
the Pareto front of this bi-objective optimization problem is DP-complete.
Finally, we show that a similar DP-completeness result holds for a bi-level
optimization version of the rule selection problem, where one minimizes first
the total error and then the size
Recommended from our members
Okapi-based XML indexing
Purpose
– Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi‐structured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi.
Design/methodology/approach
– First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections.
Findings
– Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 sub‐collections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show sub‐millisecond run times, demonstrating minimal overhead for XML search.
Practical implications
– Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable.
Originality/value
– The paper provides useful information on a method for XML indexing based on the IR system Okapi
Optimizing Spatial Databases
This paper describes the best way to improve the optimization of spatial databases: through spatial indexes. The most commune and utilized spatial indexes are R-tree and Quadtree and they are presented, analyzed and compared in this paper. Also there are given a few examples of queries that run in Oracle Spatial and are being supported by an R-tree spatial index. Spatial databases offer special features that can be very helpful when needing to represent such data. But in terms of storage and time costs, spatial data can require a lot of resources. This is why optimizing the database is one of the most important aspects when working with large volumes of data.Spatial Database, Spatial Index, R-tree, Quadtree, Optimization
- …