760 research outputs found
A Dichotomy on the Complexity of Consistent Query Answering for Atoms with Simple Keys
We study the problem of consistent query answering under primary key
violations. In this setting, the relations in a database violate the key
constraints and we are interested in maximal subsets of the database that
satisfy the constraints, which we call repairs. For a boolean query Q, the
problem CERTAINTY(Q) asks whether every such repair satisfies the query or not;
the problem is known to be always in coNP for conjunctive queries. However,
there are queries for which it can be solved in polynomial time. It has been
conjectured that there exists a dichotomy on the complexity of CERTAINTY(Q) for
conjunctive queries: it is either in PTIME or coNP-complete. In this paper, we
prove that the conjecture is indeed true for the case of conjunctive queries
without self-joins, where each atom has as a key either a single attribute
(simple key) or all attributes of the atom
Consistent Query Answering for Primary Keys in Logspace
We study the complexity of consistent query answering on databases that may violate primary key constraints. A repair of such a database is any consistent database that can be obtained by deleting a minimal set of tuples. For every Boolean query q, CERTAINTY(q) is the problem that takes a database as input and asks whether q evaluates to true on every repair. In [Koutris and Wijsen, ACM TODS, 2017], the authors show that for every self-join-free Boolean conjunctive query q, the problem CERTAINTY(q) is either in P or coNP-complete, and it is decidable which of the two cases applies. In this paper, we sharpen this result by showing that for every self-join-free Boolean conjunctive query q, the problem CERTAINTY(q) is either expressible in symmetric stratified Datalog (with some aggregation operator) or coNP-complete. Since symmetric stratified Datalog is in L, we thus obtain a complexity-theoretic dichotomy between L and coNP-complete. Another new finding of practical importance is that CERTAINTY(q) is on the logspace side of the dichotomy for queries q where all join conditions express foreign-to-primary key matches, which is undoubtedly the most common type of join condition
Query Answering in Probabilistic Data and Knowledge Bases
Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory
Consistent Query Answering for Primary Keys on Rooted Tree Queries
We study the data complexity of consistent query answering (CQA) on databases
that may violate the primary key constraints. A repair is a maximal subset of
the database satisfying the primary key constraints. For a Boolean query q, the
problem CERTAINTY(q) takes a database as input, and asks whether or not each
repair satisfies q. The computational complexity of CERTAINTY(q) has been
established whenever q is a self-join-free Boolean conjunctive query, or a (not
necessarily self-join-free) Boolean path query. In this paper, we take one more
step towards a general classification for all Boolean conjunctive queries by
considering the class of rooted tree queries. In particular, we show that for
every rooted tree query q, CERTAINTY(q) is in FO, NL-hard LFP, or
coNP-complete, and it is decidable (in polynomial time), given q, which of the
three cases applies. We also extend our classification to larger classes of
queries with simple primary keys. Our classification criteria rely on query
homomorphisms and our polynomial-time fixpoint algorithm is based on a novel
use of context-free grammar (CFG).Comment: To appear in PODS'2
Rewritability in Monadic Disjunctive Datalog, MMSNP, and Expressive Description Logics
We study rewritability of monadic disjunctive Datalog programs, (the
complements of) MMSNP sentences, and ontology-mediated queries (OMQs) based on
expressive description logics of the ALC family and on conjunctive queries. We
show that rewritability into FO and into monadic Datalog (MDLog) are decidable,
and that rewritability into Datalog is decidable when the original query
satisfies a certain condition related to equality. We establish
2NExpTime-completeness for all studied problems except rewritability into MDLog
for which there remains a gap between 2NExpTime and 3ExpTime. We also analyze
the shape of rewritings, which in the MMSNP case correspond to obstructions,
and give a new construction of canonical Datalog programs that is more
elementary than existing ones and also applies to formulas with free variables
- …