38 research outputs found

    On Equivalence and Cores for Incomplete Databases in Open and Closed Worlds

    Get PDF
    Data exchange heavily relies on the notion of incomplete database instances. Several semantics for such instances have been proposed and include open (OWA), closed (CWA), and open-closed (OCWA) world. For all these semantics important questions are: whether one incomplete instance semantically implies another; when two are semantically equivalent; and whether a smaller or smallest semantically equivalent instance exists. For OWA and CWA these questions are fully answered. For several variants of OCWA, however, they remain open. In this work we adress these questions for Closed Powerset semantics and the OCWA semantics of [Leonid Libkin and Cristina Sirangelo, 2011]. We define a new OCWA semantics, called OCWA*, in terms of homomorphic covers that subsumes both semantics, and characterize semantic implication and equivalence in terms of such covers. This characterization yields a guess-and-check algorithm to decide equivalence, and shows that the problem is NP-complete. For the minimization problem we show that for several common notions of minimality there is in general no unique minimal equivalent instance for Closed Powerset semantics, and consequently not for the more expressive OCWA* either. However, for Closed Powerset semantics we show that one can find, for any incomplete database, a unique finite set of its subinstances which are subinstances (up to renaming of nulls) of all instances semantically equivalent to the original incomplete one. We study properties of this set, and extend the analysis to OCWA*

    Efficient Multi-way Theta-Join Processing Using MapReduce

    Full text link
    Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to support OLAP applications over immense data volumes. In this work, we study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a cost-effective perspective. Although there have been some works using the (key,value) pair-based programming model to support join operations, efficient processing of multi-way Theta-join queries has never been fully explored. The substantial challenge lies in, given a number of processing units (that can run Map or Reduce tasks), mapping a multi-way Theta-join query to a number of MapReduce jobs and having them executed in a well scheduled sequence, such that the total processing time span is minimized. Our solution mainly includes two parts: 1) cost metrics for both single MapReduce job and a number of MapReduce jobs executed in a certain order; 2) the efficient execution of a chain-typed Theta-join with only one MapReduce job. Comparing with the query evaluation strategy proposed in [23] and the widely adopted Pig Latin and Hive SQL solutions, our method achieves significant improvement of the join processing efficiency.Comment: VLDB201

    A Bounded Degree Property and Finite-Cofiniteness of Graph Queries

    Get PDF
    We provide new techniques for the analysis of the expressive power of query languages for nested collections. These languages may use set or bag semantics and may be further complicated by the presence of aggregate functions. We exhibit certain classes of graphics and prove that properties of these graphics that can be tested in such languages are either finite or cofinite. This result settles that conjectures of Grumbach, Milo, and Paredaens that parity test, transitive closure, and balanced binary tree test are not expressible in bah languages like BALG of Grumbach and Milo and BQL of Libkin and Wong. Moreover, it implies that many recursive queries, including simple ones like test for a chain, cannot be expressed in a nested relational language even when aggregate functions are available. In an attempt to generalize the finite-cofiniteness result, we study the bounded degree property which says that the number of distinct in- and out-degrees in the output of a graph query does not depend on the size of the input if the input is simple. We show that such a property implies a number of inexpressibility results in a uniform fashion. We then prove the bounded degree property for the nested relational language

    Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics

    Get PDF
    Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query. This property is referred to as parallel-correctness. Another key problem is to detect whether the data reshuffle step can be avoided when evaluating subsequent queries. The latter problem is referred to as transfer of parallel-correctness. This paper extends the study of parallel-correctness and transfer of parallel-correctness of conjunctive queries to incorporate bag semantics. We provide semantical characterizations for both problems, obtain complexity bounds and discuss the relationship with their set semantics counterparts. Finally, we revisit both problems under a modified distribution model that takes advantage of a linear order on compute nodes and obtain tight complexity bounds

    The Bag Semantics of Ontology-Based Data Access

    Full text link
    Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNP-hard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags

    Query Rewriting by Contract under Privacy Constraints

    Get PDF
    In this paper we show how Query Rewriting rules and Containment checks of aggregate queries can be combined with Contract-based programming techniques. Based on the combination of both worlds, we are able to find new Query Rewriting rules for queries containing aggregate constraints. These rules can either be used to improve the overall system performance or, in our use case, to implement a privacy-aware way to process queries. By integrating them in our PArADISE framework, we can now process and rewrite all types of OLAP queries, including complex aggregate functions and group-by extensions. In our framework, we use the whole network structure, from data producing sensors up to cloud computers, to automatically deploy an edge computing subnetwork. On each edge node, so-called fragment queries of a genuine query are executed to filter and to aggregate data on resource restricted sensor nodes. As a result of integrating Contract-based programming approaches, we are now able to not only process less data but also to produce less data in the result. Thus, the privacy principle of data minimization is accomplished

    Graphical Conjunctive Queries

    Get PDF
    The Calculus of Conjunctive Queries (CCQ) has foundational status in database theory. A celebrated theorem of Chandra and Merlin states that CCQ query inclusion is decidable. Its proof transforms logical formulas to graphs: each query has a natural model - a kind of graph - and query inclusion reduces to the existence of a graph homomorphism between natural models. We introduce the diagrammatic language Graphical Conjunctive Queries (GCQ) and show that it has the same expressivity as CCQ. GCQ terms are string diagrams, and their algebraic structure allows us to derive a sound and complete axiomatisation of query inclusion, which turns out to be exactly Carboni and Walters\u27 notion of cartesian bicategory of relations. Our completeness proof exploits the combinatorial nature of string diagrams as (certain cospans of) hypergraphs: Chandra and Merlin\u27s insights inspire a theorem that relates such cospans with spans. Completeness and decidability of the (in)equational theory of GCQ follow as a corollary. Categorically speaking, our contribution is a model-theoretic completeness theorem of free cartesian bicategories (on a relational signature) for the category of sets and relations
    corecore