1,742 research outputs found
Reasoning About Integrity Constraints for Tree-Structured Data
We study a class of integrity constraints for tree-structured data modelled as data trees, whose nodes have a label from a finite alphabet and store a data value from an infinite data domain. The constraints require each tuple of nodes selected by a conjunctive query (using navigational axes and labels) to satisfy a positive combination of equalities and a positive combination of inequalities over the stored data values. Such constraints are instances of the general framework of XML-to-relational constraints proposed recently by Niewerth and Schwentick. They cover some common classes of constraints, including W3C XML Schema key and unique constraints, as well as domain restrictions and denial constraints, but cannot express inclusion constraints, such as reference keys. Our main result is that consistency of such integrity constraints with respect to a given schema (modelled as a tree automaton) is decidable. An easy extension gives decidability for the entailment problem. Equivalently, we show that validity and containment of unions of conjunctive queries using navigational axes, labels, data equalities and inequalities is decidable, as long as none of the conjunctive queries uses both equalities and inequalities; without this restriction, both problems are known to be undecidable. In the context of XML data exchange, our result can be used to establish decidability for a consistency problem for XML schema mappings. All the decision procedures are doubly exponential, with matching lower bounds. The complexity may be lowered to singly exponential, when conjunctive queries are replaced by tree patterns, and the number of data comparisons is bounded
Validation of schema mappings with nested queries
With the emergence of the Web and the wide use of XML for representing data, the ability to map not only flat relational but also nested data has become crucial. The design of schema mappings is a semi-automatic process. A human designer is needed to guide the process, choose among mapping candidates, and successively refine the mapping. The designer needs a way to figure out whether the mapping is what was intended. Our approach to mapping validation allows the designer to check whether the mapping satisfies certain desirable properties. In this paper, we focus on the validation of mappings between nested relational schemas, in which the mapping assertions are either inclusions or equalities of nested queries. We focus on the nested relational setting since most XMLās Document Type Definitions (DTDs) can be represented in this model. We perform the validation by reasoning on the schemas and mapping definition. We take into account the integrity constraints defined on both the source and target schema.Preprin
Semantic Query Reformulation in Social PDMS
We consider social peer-to-peer data management systems (PDMS), where each
peer maintains both semantic mappings between its schema and some
acquaintances, and social links with peer friends. In this context,
reformulating a query from a peer's schema into other peer's schemas is a hard
problem, as it may generate as many rewritings as the set of mappings from that
peer to the outside and transitively on, by eventually traversing the entire
network. However, not all the obtained rewritings are relevant to a given
query. In this paper, we address this problem by inspecting semantic mappings
and social links to find only relevant rewritings. We propose a new notion of
'relevance' of a query with respect to a mapping, and, based on this notion, a
new semantic query reformulation approach for social PDMS, which achieves great
accuracy and flexibility. To find rapidly the most interesting mappings, we
combine several techniques: (i) social links are expressed as FOAF (Friend of a
Friend) links to characterize peer's friendship and compact mapping summaries
are used to obtain mapping descriptions; (ii) local semantic views are special
views that contain information about external mappings; and (iii) gossiping
techniques improve the search of relevant mappings. Our experimental
evaluation, based on a prototype on top of PeerSim and a simulated network
demonstrate that our solution yields greater recall, compared to traditional
query translation approaches proposed in the literature.Comment: 29 pages, 8 figures, query rewriting in PDM
Peer Data Management
Peer Data Management (PDM) deals with the management of structured data in unstructured peer-to-peer (P2P) networks. Each peer can store data locally and define relationships between its data and the data provided by other peers. Queries posed to any of the peers are then answered by also considering the information implied by those mappings.
The overall goal of PDM is to provide semantically well-founded integration and exchange of heterogeneous and distributed data sources. Unlike traditional data integration systems, peer data management systems (PDMSs) thereby allow for full autonomy of each member and need no central coordinator. The promise of such systems is to provide flexible data integration and exchange at low setup and maintenance costs.
However, building such systems raises many challenges. Beside the obvious scalability problem, choosing an appropriate semantics that can deal with arbitrary, even cyclic topologies, data inconsistencies, or updates while at the same time allowing for tractable reasoning has been an area of active research in the last decade. In this survey we provide an overview of the different approaches suggested in the literature to tackle these problems, focusing on appropriate semantics for query answering and data exchange rather than on implementation specific problems
Four Lessons in Versatility or How Query Languages Adapt to the Web
Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3Cās GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a āWeb of Dataā
Exchange-Repairs: Managing Inconsistency in Data Exchange
In a data exchange setting with target constraints, it is often the case that
a given source instance has no solutions. In such cases, the semantics of
target queries trivialize. The aim of this paper is to introduce and explore a
new framework that gives meaningful semantics in such cases by using the notion
of exchange-repairs. Informally, an exchange-repair of a source instance is
another source instance that differs minimally from the first, but has a
solution. Exchange-repairs give rise to a natural notion of exchange-repair
certain answers (XR-certain answers) for target queries. We show that for
schema mappings specified by source-to-target GAV dependencies and target
equality-generating dependencies (egds), the XR-certain answers of a target
conjunctive query can be rewritten as the consistent answers (in the sense of
standard database repairs) of a union of conjunctive queries over the source
schema with respect to a set of egds over the source schema, making it possible
to use a consistent query-answering system to compute XR-certain answers in
data exchange. We then examine the general case of schema mappings specified by
source-to-target GLAV constraints, a weakly acyclic set of target tgds and a
set of target egds. The main result asserts that, for such settings, the
XR-certain answers of conjunctive queries can be rewritten as the certain
answers of a union of conjunctive queries with respect to the stable models of
a disjunctive logic program over a suitable expansion of the source schema.Comment: 29 pages, 13 figures, submitted to the Journal on Data Semantic
Universal Solutions in Temporal Data Exchange
During the past fifteen years, data exchange has been explored in depth and in a variety of different settings. Even though temporal databases constitute a mature area of research studied over several decades, the investigation of temporal data exchange was initiated only very recently. We analyze the properties of universal solutions in temporal data exchange with emphasis on the relationship between universal solutions in the context of concrete time and universal solutions in the context of abstract time. We show that challenges arise even in the setting in which the data exchange specifications involve a single temporal variable. After this, we identify settings, including data exchange settings that involve multiple temporal variables, in which these challenges can be overcome
Composition with Target Constraints
It is known that the composition of schema mappings, each specified by
source-to-target tgds (st-tgds), can be specified by a second-order tgd (SO
tgd). We consider the question of what happens when target constraints are
allowed. Specifically, we consider the question of specifying the composition
of standard schema mappings (those specified by st-tgds, target egds, and a
weakly acyclic set of target tgds). We show that SO tgds, even with the
assistance of arbitrary source constraints and target constraints, cannot
specify in general the composition of two standard schema mappings. Therefore,
we introduce source-to-target second-order dependencies (st-SO dependencies),
which are similar to SO tgds, but allow equations in the conclusion. We show
that st-SO dependencies (along with target egds and target tgds) are sufficient
to express the composition of every finite sequence of standard schema
mappings, and further, every st-SO dependency specifies such a composition. In
addition to this expressive power, we show that st-SO dependencies enjoy other
desirable properties. In particular, they have a polynomial-time chase that
generates a universal solution. This universal solution can be used to find the
certain answers to unions of conjunctive queries in polynomial time. It is easy
to show that the composition of an arbitrary number of standard schema mappings
is equivalent to the composition of only two standard schema mappings. We show
that surprisingly, the analogous result holds also for schema mappings
specified by just st-tgds (no target constraints). This is proven by showing
that every SO tgd is equivalent to an unnested SO tgd (one where there is no
nesting of function symbols). Similarly, we prove unnesting results for st-SO
dependencies, with the same types of consequences.Comment: This paper is an extended version of: M. Arenas, R. Fagin, and A.
Nash. Composition with Target Constraints. In 13th International Conference
on Database Theory (ICDT), pages 129-142, 201
- ā¦