Search CORE

184 research outputs found

Recommended from our members

The Complexity of Resilience

Author: Matos Freire Cibele
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/11/2017
Field of study

One focus area in data management research is to understand how changes in the data can affect the output of a view or standing query. Example applications are explaining query results and propagating updates through views. In this thesis we study the complexity of the Resilience problem, which is the problem of finding the minimum number of tuples that need to be deleted from the database in order to change the result of a query. We will see that resilience is closely related to the well-studied problems of deletion propagation and causal responsibility, and that analyzing its complexity offers important insight for solving those problems as well. Our contributions include the definition of the concept of triads for conjunctive queries, which is a crucial tool on our analysis, and the characterization of a NP versus P dichotomy for the resilience problem considering the class of conjunctive queries without self-joins. Moreover, this result allowed us to show dichotomies for the same class of queries for both deletion propagation with source side-effects and causal responsibility problems. We also completely characterize how the presence of functional dependencies can change the complexity of such problems. The class of conjunctive queries with self-joins is far richer and more complicated than the self-join-free ones. Therefore we focus on binary queries without variable repetition, which are queries formed by unary or binary relations only and each atom has only one occurrence of any variable. For this restricted case, we identify three main query structures that help us identify complexity: chains, permutations and confluences. Using those we are able to characterize classes of queries for which resilience is NP-complete and some for which it is P

ScholarWorks@UMass Amherst

A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

Author: Gatterbauer Wolfgang
Makhija Neha
Publication venue
Publication date: 25/08/2023
Field of study

Resilience is one of the key algorithmic problems underlying various forms of reverse data management (such as view maintenance, deletion propagation, and various interventions for fairness): What is the minimal number of tuples to delete from a database in order to remove all answers from a query? A long-open question is determining those conjunctive queries (CQs) for which this problem can be solved in guaranteed PTIME. We shed new light on this and the related problem of causal responsibility by proposing a unified Integer Linear Programming (ILP) formulation. It is unified in that it can solve both prior studied restrictions (e.g., self-join-free CQs under set semantics that allow a PTIME solution) and new cases (e.g., all CQs under set or bag semantics It is also unified in that all queries and all instances are treated with the same approach, and the algorithm is guaranteed to terminate in PTIME for the easy cases. We prove that, for all easy self-join-free CQs, the Linear Programming (LP) relaxation of our encoding is identical to the ILP solution and thus standard ILP solvers are guaranteed to return the solution in PTIME. Our approach opens up the door to new variants and new fine-grained analysis: 1) It also works under bag semantics and we give the first dichotomy result for bags semantics in the problem space. 2) We give a more fine-grained analysis of the complexity of causal responsibility. 3) We recover easy instances for generally hard queries, such as instances with read-once provenance and instances that become easy because of Functional Dependencies in the data. 4) We solve an open conjecture from PODS 2020. 5) Experiments confirm that our results indeed predict the asymptotic running times, and that our universal ILP encoding is at times even faster to solve for the PTIME cases than a prior proposed dedicated flow algorithm.Comment: 25 pages, 16 figure

arXiv.org e-Print Archive

Explain3D: Explaining Disagreements in Disjoint Datasets

Author: Wang Xiaolan
Meliou Alexandra
Publication venue
Publication date: 24/02/1911
Field of study

Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

arXiv.org e-Print Archive

Trinity College

Dichotomies in Ontology-Mediated Querying with the Guarded Fragment

Author: Hernich Andre
Lutz Carsten
Machinery Assoc Comp
Papacchini Fabio
Wolter Frank
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

We study the complexity of ontology-mediated querying when ontologies are formulated in the guarded fragment of first-order logic (GF). Our general aim is to classify the data complexity on the level of ontologies where query evaluation w.r.t. an ontology O is considered to be in PTime if all (unions of conjunctive) queries can be evaluated in PTime w.r.t. O and coNP-hard if at least one query is coNP-hard w.r.t. O. We identify several large and relevant fragments of GF that enjoy a dichotomy between PTime and coNP, some of them additionally admitting a form of counting. In fact, almost all ontologies in the BioPortal repository fall into these fragments or can easily be rewritten to do so. We then establish a variation of Ladner's Theorem on the existence of NP-intermediate problems and use this result to show that for other fragments, there is provably no such dichotomy. Again for other fragments (such as full GF), establishing a dichotomy implies the Feder-Vardi conjecture on the complexity of constraint satisfaction problems. We also link these results to Datalog-rewritability and study the decidability of whether a given ontology enjoys PTime query evaluation, presenting both positive and negative results

arXiv.org e-Print Archive

University of Liverpool Repository

Lancaster E-Prints

Workshop on Database Programming Languages

Author: Bancilhon François
Buneman Peter
Publication venue: ScholarlyCommons
Publication date: 01/11/1988
Field of study

These are the revised proceedings of the Workshop on Database Programming Languages held at Roscoff, Finistère, France in September of 1987. The last few years have seen an enormous activity in the development of new programming languages and new programming environments for databases. The purpose of the workshop was to bring together researchers from both databases and programming languages to discuss recent developments in the two areas in the hope of overcoming some of the obstacles that appear to prevent the construction of a uniform database programming environment. The workshop, which follows a previous workshop held in Appin, Scotland in 1985, was extremely successful. The organizers were delighted with both the quality and volume of the submissions for this meeting, and it was regrettable that more papers could not be accepted. Both the stimulating discussions and the excellent food and scenery of the Brittany coast made the meeting thoroughly enjoyable. There were three main foci for this workshop: the type systems suitable for databases (especially object-oriented and complex-object databases,) the representation and manipulation of persistent structures, and extensions to deductive databases that allow for more general and flexible programming. Many of the papers describe recent results, or work in progress, and are indicative of the latest research trends in database programming languages. The organizers are extremely grateful for the financial support given by CRAI (Italy), Altaïr (France) and AT&T (USA). We would also like to acknowledge the organizational help provided by Florence Deshors, Hélène Gans and Pauline Turcaud of Altaïr, and by Karen Carter of the University of Pennsylvania

ScholarlyCommons@Penn