117 research outputs found
Naive Evaluation of Queries over Incomplete Databases
International audienceThe term naive evaluation refers to evaluating queries over incomplete databases as if nulls were usual data values, i.e., to using the standard database query evaluation engine. Since the semantics of query answering over incomplete databases is that of certain answers, we would like to know when naive evaluation computes them: i.e., when certain answers can be found without inventing new specialized algorithms. For relational databases it is well known that unions of conjunctive queries possess this desirable property, and results on preservation of formulae under homomorphisms tell us that within relational calculus, this class cannot be extended under the open-world assumption. Our goal here is twofold. First, we develop a general framework that allows us to determine, for a given semantics of incompleteness, classes of queries for which naive evaluation computes certain answers. Second, we apply this approach to a variety of semantics, showing that for many classes of queries beyond unions of conjunctive queries, naive evaluation makes perfect sense under assumptions different from open-world. Our key observations are: (1) naive evaluation is equivalent to monotonicity of queries with respect to a semantics-induced ordering, and (2) for most reasonable semantics of incompleteness, such monotonicity is captured by preservation under various types of homomorphisms. Using these results we find classes of queries for which naive evaluation works, e.g., positive first-order formulae for the closed-world semantics. Even more, we introduce a general relation-based framework for defining semantics of incompleteness, show how it can be used to capture many known semantics and to introduce new ones, and describe classes of first-order queries for which naive evaluation works under such semantics
Coping with Incomplete Data: Recent Advances
Handling incomplete data in a correct manner is a notoriously hard problem in databases. Theoretical approaches rely on the computationally hard notion of certain answers, while practical solutions rely on ad hoc query evaluation techniques based on three-valued logic. Can we find a middle ground, and produce correct answers efficiently? The paper surveys results of the last few years motivated by this question. We re-examine the notion of certainty itself, and show that it is much more varied than previously thought. We identify cases when certain answers can be computed efficiently and, short of that, provide deterministic and probabilistic approximation schemes for them. We look at the role of three-valued logic as used in SQL query evaluation, and discuss the correctness of the choice, as well as the necessity of such a logic for producing query answers
Coping with Incomplete Data: Recent Advances
International audienceHandling incomplete data in a correct manner is a notoriously hard problem in databases. Theoretical approaches rely on the computationally hard notion of certain answers, while practical solutions rely on ad hoc query evaluation techniques based on threevalued logic. Can we find a middle ground, and produce correct answers efficiently? The paper surveys results of the last few years motivated by this question. We reexamine the notion of certainty itself, and show that it is much more varied than previously thought. We identify cases when certain answers can be computed efficiently and, short of that, provide deterministic and probabilistic approximation schemes for them. We look at the role of three-valued logic as used in SQL query evaluation, and discuss the correctness of the choice, as well as the necessity of such a logic for producing query answers
On First-Order Definable Colorings
We address the problem of characterizing -coloring problems that are
first-order definable on a fixed class of relational structures. In this
context, we give several characterizations of a homomorphism dualities arising
in a class of structure
When do homomorphism counts help in query algorithms?
A query algorithm based on homomorphism counts is a procedure for determining
whether a given instance satisfies a property by counting homomorphisms between
the given instance and finitely many predetermined instances. In a left query
algorithm, we count homomorphisms from the predetermined instances to the given
instance, while in a right query algorithm we count homomorphisms from the
given instance to the predetermined instances. Homomorphisms are usually
counted over the semiring N of non-negative integers; it is also meaningful,
however, to count homomorphisms over the Boolean semiring B, in which case the
homomorphism count indicates whether or not a homomorphism exists. We first
characterize the properties that admit a left query algorithm over B by showing
that these are precisely the properties that are both first-order definable and
closed under homomorphic equivalence. After this, we turn attention to a
comparison between left query algorithms over B and left query algorithms over
N. In general, there are properties that admit a left query algorithm over N
but not over B. The main result of this paper asserts that if a property is
closed under homomorphic equivalence, then that property admits a left query
algorithm over B if and only if it admits a left query algorithm over N. In
other words and rather surprisingly, homomorphism counts over N do not help as
regards properties that are closed under homomorphic equivalence. Finally, we
characterize the properties that admit both a left query algorithm over B and a
right query algorithm over B.Comment: 24 page
On the Complexity of Existential Positive Queries
We systematically investigate the complexity of model checking the
existential positive fragment of first-order logic. In particular, for a set of
existential positive sentences, we consider model checking where the sentence
is restricted to fall into the set; a natural question is then to classify
which sentence sets are tractable and which are intractable. With respect to
fixed-parameter tractability, we give a general theorem that reduces this
classification question to the corresponding question for primitive positive
logic, for a variety of representations of structures. This general theorem
allows us to deduce that an existential positive sentence set having bounded
arity is fixed-parameter tractable if and only if each sentence is equivalent
to one in bounded-variable logic. We then use the lens of classical complexity
to study these fixed-parameter tractable sentence sets. We show that such a set
can be NP-complete, and consider the length needed by a translation from
sentences in such a set to bounded-variable logic; we prove superpolynomial
lower bounds on this length using the theory of compilability, obtaining an
interesting type of formula size lower bound. Overall, the tools, concepts, and
results of this article set the stage for the future consideration of the
complexity of model checking on more expressive logics
Preservation and decomposition theorems for bounded degree structures
We provide elementary algorithms for two preservation theorems for
first-order sentences (FO) on the class \^ad of all finite structures of degree
at most d: For each FO-sentence that is preserved under extensions
(homomorphisms) on \^ad, a \^ad-equivalent existential (existential-positive)
FO-sentence can be constructed in 5-fold (4-fold) exponential time. This is
complemented by lower bounds showing that a 3-fold exponential blow-up of the
computed existential (existential-positive) sentence is unavoidable. Both
algorithms can be extended (while maintaining the upper and lower bounds on
their time complexity) to input first-order sentences with modulo m counting
quantifiers (FO+MODm). Furthermore, we show that for an input FO-formula, a
\^ad-equivalent Feferman-Vaught decomposition can be computed in 3-fold
exponential time. We also provide a matching lower bound.Comment: 42 pages and 3 figures. This is the full version of: Frederik
Harwath, Lucas Heimberg, and Nicole Schweikardt. Preservation and
decomposition theorems for bounded degree structures. In Joint Meeting of the
23rd EACSL Annual Conference on Computer Science Logic (CSL) and the 29th
Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), CSL-LICS'14,
pages 49:1-49:10. ACM, 201
Representing and Querying Incomplete Information: a Data Interoperability Perspective
This habilitation thesis presents some of my most recent work, which has been done in collaboration with several other people. In particular this thesis concentrates on our contributions to the study of incomplete information in the context of data interoperability. In this scenario data is heterogenous and decentralized, needs to be integrated from several sources and exchanged between different applications. Incompleteness, i.e. the presence of âmissingâ or âunknownâ portions of data, is naturally generated in data exchange and integration, due to data heterogeneity. The management of incomplete information poses new challenges in this context.The focus of our study is the development of models of incomplete information suitable to data interoperability tasks, and the study of techniques for efficiently querying several forms of incompleteness
Pattern logics and auxiliary relations
A common theme in the study of logics over finite structures is adding auxiliary predicates to enhance expressiveness and convey additional information. Examples include adding an order or arith-metic for capturing complexity classes, or the power of real-life declarative languages. A recent trend is to add a data-value com-parison relation to words, trees, and graphs, for capturing modern data models such as XML and graph databases. Such additions often result in the loss of good properties of the underlying logic. Our goal is to show that such a loss can be avoided if we use pattern-based logics, standard in XML and graph data querying. The essence of such logics is that auxiliary relations are tested locally with respect to other relations in the structure. These logics are shown to admit strong versions of Hanf and Gaif-man locality theorems, which are used to prove a homomorphism preservation theorem, and a decidability result for the satisfiability problem. We discuss applications of these results to pattern logics over data forests, and consequently to querying XML data
- âŠ