Search CORE

250 research outputs found

Extending Dependencies with Conditions

Author: Bravo Loreto
Fan Wenfei
Ma Shuai
Publication venue
Publication date: 01/01/2007
Field of study

Functional Dependencies Unleashed for Scalable Data Exchange

Author: Bonifati Angela
Ileana Ioana
Linardi Michele
Publication venue
Publication date: 16/04/2016
Field of study

We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aiming at gauging its scalability with respect to a number of parameters, among which the size of source instances and the number of dependencies of each tested scenario. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

HAL

Hal-Diderot

Improving Data Quality: Consistency and Accuracy

Author: Cong Gao
Fan Wenfei
Geerts Floris
Jia Xibei
Ma Shuai
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen

Distribution Constraints: The Chase for Distributed Data

Author: Geck Gaetano
Neven Frank
Schwentick Thomas
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Database Theory (ICDT 2020)
Publication date: 01/01/2020
Field of study

This paper introduces a declarative framework to specify and reason about distributions of data over computing nodes in a distributed setting. More specifically, it proposes distribution constraints which are tuple and equality generating dependencies (tgds and egds) extended with node variables ranging over computing nodes. In particular, they can express co-partitioning constraints and constraints about range-based data distributions by using comparison atoms. The main technical contribution is the study of the implication problem of distribution constraints. While implication is undecidable in general, relevant fragments of so-called data-full constraints are exhibited for which the corresponding implication problems are complete for EXPTIME, PSPACE and NP. These results yield bounds on deciding parallel-correctness for conjunctive queries in the presence of distribution constraints

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Propagating functional dependencies with conditions

Author: Fan Wenfei
Hu Yanli
Liu Jie
Ma Shuai
Wu Yinghui
Publication venue
Publication date: 01/01/2008
Field of study

The dependency propagation problem is to determine, given a view defined on data sources and a set of dependencies on the sources, whether another dependency is guaranteed to hold on the view. This paper investigates dependency propagation for recently proposed conditional functional dependencies (CFDs). The need for this study is evident in data integration, exchange and cleaning since dependencies on data sources often only hold conditionally on the view. We investigate dependency propagation for views defined in various fragments of relational algebra, CFDs as view dependencies, and for source dependencies given as either CFDs or traditional functional dependencies (FDs). (a) We establish lower and upper bounds, all matching , ranging from PTIME to undecidable. These not only provide the first results for CFD propagation, but also extend the classical work of FD propagation by giving new complexity bounds in the presence of finite domains. (b) We provide the first algorithm for computing a minimal cover of all CFDs propagated via SPC views; the algorithm has the same complexity as one of the most efficient algorithms for computing a cover of FDs propagated via a projection view, despite the increased expressive power of CFDs and SPC views. (c) We experimentally verify that the algorithm is efficient. </jats:p

Crossref

Edinburgh Research Explorer

Relational to RDF Data Exchange in Presence of a Shape Expression Schema

Author: Boneva Iovka
Lozano Jose
Staworko Sławek
Publication venue: HAL CCSD
Publication date: 21/05/2018
Field of study

International audienceWe study the relational to RDF data exchange problem, where the target constraints are specified using Shape Expression schema (ShEx). We investigate two fundamental problems: 1) consistency which is checking for a given data exchange setting whether there always exists a solution for any source instance, and 2) constructing a universal solution which is a solution that represents the space of all solutions. We propose to use typed IRI constructors in source-to-target tuple generating dependencies to create the IRIs of the RDF graph from the values in the relational instance, and we translate ShEx into a set of target dependencies. We also identify data exchange settings that are key covered, a property that is decidable and guarantees consistency. Furthermore, we show that this property is a sufficient and necessary condition for the existence of universal solutions for a practical subclass of weakly-recursive ShEx

On Chase Termination Beyond Stratification

Author: Lausen Georg
Meier Michael
Schmidt Michael
Publication venue
Publication date: 01/01/2009
Field of study

We study the termination problem of the chase algorithm, a central tool in various database problems such as the constraint implication problem, Conjunctive Query optimization, rewriting queries using views, data exchange, and data integration. The basic idea of the chase is, given a database instance and a set of constraints as input, to fix constraint violations in the database instance. It is well-known that, for an arbitrary set of constraints, the chase does not necessarily terminate (in general, it is even undecidable if it does or not). Addressing this issue, we review the limitations of existing sufficient termination conditions for the chase and develop new techniques that allow us to establish weaker sufficient conditions. In particular, we introduce two novel termination conditions called safety and inductive restriction, and use them to define the so-called T-hierarchy of termination conditions. We then study the interrelations of our termination conditions with previous conditions and the complexity of checking our conditions. This analysis leads to an algorithm that checks membership in a level of the T-hierarchy and accounts for the complexity of termination conditions. As another contribution, we study the problem of data-dependent chase termination and present sufficient termination conditions w.r.t. fixed instances. They might guarantee termination although the chase does not terminate in the general case. As an application of our techniques beyond those already mentioned, we transfer our results into the field of query answering over knowledge bases where the chase on the underlying database may not terminate, making existing algorithms applicable to broader classes of constraints.Comment: Technical Report of VLDB 2009 conference versio

arXiv.org e-Print Archive

CiteSeerX

Consistent Query Answers in the Presence of Universal Constraints

Author: Abiteboul
Afrati
Arenas
Arenas
Arenas
Baral
Barcelo
Baudinet
Bertossi
Bertossi
Bertossi
Bohannon
Cali
Chomicki
Chomicki
Chomicki
Chomicki
Cormen
Eiter
Eiter
Eiter
Fan
Fuxman
Fuxman
Greco
Greco
Grieco
Jan Chomicki
Lopatenko
Maher
Maher
Papadimitriou
Ramakrishnan
Staworko
Sławomir Staworko
Van Nieuwenborgh
Vardi
Wijsen
Wijsen
Wijsen
Publication venue
Publication date: 01/01/2008
Field of study

The framework of consistent query answers and repairs has been introduced to alleviate the impact of inconsistent data on the answers to a query. A repair is a minimally different consistent instance and an answer is consistent if it is present in every repair. In this article we study the complexity of consistent query answers and repair checking in the presence of universal constraints. We propose an extended version of the conflict hypergraph which allows to capture all repairs w.r.t. a set of universal constraints. We show that repair checking is in PTIME for the class of full tuple-generating dependencies and denial constraints, and we present a polynomial repair algorithm. This algorithm is sound, i.e. always produces a repair, but also complete, i.e. every repair can be constructed. Next, we present a polynomial-time algorithm computing consistent answers to ground quantifier-free queries in the presence of denial constraints, join dependencies, and acyclic full-tuple generating dependencies. Finally, we show that extending the class of constraints leads to intractability. For arbitrary full tuple-generating dependencies consistent query answering becomes coNP-complete. For arbitrary universal constraints consistent query answering is \Pi_2^p-complete and repair checking coNP-complete.Comment: Submitted to Information System

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Analyses and Validation of Conditional Dependencies with Built-in Predicates

Author: J. Chomicki
L.E. Bertossi
M. Baudinet
M. Garey
M.J. Maher
S. Abiteboul
S. Flesca
W. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper proposes a natural extension of conditional functional dependencies (CFDS [14]) and conditional inclusion dependencies (CINDS [8]), denoted by CFD(p)s and CIND(p)s, respectively, by specifying patterns of data, values with not equal, <, <=, > and >= predicates. As data quality rules, CFD(p)s and CIND(p)s are able to capture errors that commonly arise in practice but cannot, be detected by CFDS and CINDS. We establish two sets of results for central technical problems associated with CFD(p)s and CIND(p)s. (a) One concerns the satisfiability and implication problems for CFD(p)s and CIND(p)s, taken separately or together. These are important for, e.g., deciding whether data, quality rules are dirty themselves, and for removing redundant rules. We show that despite the increased expressive power, the static analyses of CFD(p)s and CIND(p)s retain the same complexity as their CFDs and CINDs counterparts. (b) The other concerns validation of CFD(p)s and CIND(p)s. We show that given a set Sigma of CFD(p)s and CIND(p)s on a database D, a, set of SQL queries can be automatically generated that, when evaluated against D, return all tuples in D that violate some dependencies in Sigma. This provides commercial DBMS with an immediate capability to detect errors based on CFD(p)s and CIND(p)s.Computer Science, Information SystemsComputer Science, Theory & MethodsEICPCI-S(ISTP)

Crossref

Edinburgh Research Explorer

Canonical queries as a query answering device (Information Science)

Author: Graham Marc Henry
Publication venue: Georgia Institute of Technology
Publication date: 01/01/1983
Field of study

Issued as Annual reports [nos. 1-2], and Final report, Project no. G-36-60

Scholarly Materials And Research @ Georgia Tech