192 research outputs found
On Incomplete XML Documents with Integrity Constraints
Abstract. We consider incomplete specifications of XML documents in the presence of schema information and integrity constraints. We show that integrity constraints such as keys and foreign keys affect consistency of such specifications. We prove that the consistency problem for incomplete specifications with keys and foreign keys can always be solved in NP. We then show a dichotomy result, classifying the complexity of the problem as NP-complete or PTIME, depending on the precise set of features used in incomplete descriptions.
Fragments of bag relational algebra: Expressiveness and certain answers
While all relational database systems are based on the bag data model, much of theoretical research still views relations as sets. Recent attempts to provide theoretical foundations for modern data management problems under the bag semantics concentrated on applications that need to deal with incomplete relations, i.e., relations populated by constants and nulls. Our goal is to provide a complete characterization of the complexity of query answering over such relations in fragments of bag relational algebra. The main challenges that we face are twofold. First, bag relational algebra has more operations than its set analog (e.g., additive union, max-union, min-intersection, duplicate elimination) and the relationship between various fragments is not fully known. Thus we first fill this gap. Second, we look at query answering over incomplete data, which again is more complex than in the set case: rather than certainty and possibility of answers, we now have numerical information about occurrences of tuples. We then fully classify the complexity of finding this information in all the fragments of bag relational algebra
Queries with Arithmetic on Incomplete Databases
The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach
Coping with Incomplete Data: Recent Advances
Handling incomplete data in a correct manner is a notoriously hard problem in databases. Theoretical approaches rely on the computationally hard notion of certain answers, while practical solutions rely on ad hoc query evaluation techniques based on three-valued logic. Can we find a middle ground, and produce correct answers efficiently? The paper surveys results of the last few years motivated by this question. We re-examine the notion of certainty itself, and show that it is much more varied than previously thought. We identify cases when certain answers can be computed efficiently and, short of that, provide deterministic and probabilistic approximation schemes for them. We look at the role of three-valued logic as used in SQL query evaluation, and discuss the correctness of the choice, as well as the necessity of such a logic for producing query answers
Relating Structure and Power: Comonadic Semantics for Computational Resources
Combinatorial games are widely used in finite model theory, constraint
satisfaction, modal logic and concurrency theory to characterize logical
equivalences between structures. In particular, Ehrenfeucht-Fraisse games,
pebble games, and bisimulation games play a central role. We show how each of
these types of games can be described in terms of an indexed family of comonads
on the category of relational structures and homomorphisms. The index k is a
resource parameter which bounds the degree of access to the underlying
structure. The coKleisli categories for these comonads can be used to give
syntax-free characterizations of a wide range of important logical
equivalences. Moreover, the coalgebras for these indexed comonads can be used
to characterize key combinatorial parameters: tree-depth for the
Ehrenfeucht-Fraisse comonad, tree-width for the pebbling comonad, and
synchronization-tree depth for the modal unfolding comonad. These results pave
the way for systematic connections between two major branches of the field of
logic in computer science which hitherto have been almost disjoint: categorical
semantics, and finite and algorithmic model theory.Comment: To appear in Proceedings of Computer Science Logic 201
Computer-supported Exploration of a Categorical Axiomatization of Modeloids
A modeloid, a certain set of partial bijections, emerges from the idea to
abstract from a structure to the set of its partial automorphisms. It comes
with an operation, called the derivative, which is inspired by
Ehrenfeucht-Fra\"iss\'e games. In this paper we develop a generalization of a
modeloid first to an inverse semigroup and then to an inverse category using an
axiomatic approach to category theory. We then show that this formulation
enables a purely algebraic view on Ehrenfeucht-Fra\"iss\'e games.Comment: 24 pages; accepted for conference: Relational and Algebraic Methods
in Computer Science (RAMICS 2020
Naive Evaluation of Queries over Incomplete Databases
International audienceThe term naive evaluation refers to evaluating queries over incomplete databases as if nulls were usual data values, i.e., to using the standard database query evaluation engine. Since the semantics of query answering over incomplete databases is that of certain answers, we would like to know when naive evaluation computes them: i.e., when certain answers can be found without inventing new specialized algorithms. For relational databases it is well known that unions of conjunctive queries possess this desirable property, and results on preservation of formulae under homomorphisms tell us that within relational calculus, this class cannot be extended under the open-world assumption. Our goal here is twofold. First, we develop a general framework that allows us to determine, for a given semantics of incompleteness, classes of queries for which naive evaluation computes certain answers. Second, we apply this approach to a variety of semantics, showing that for many classes of queries beyond unions of conjunctive queries, naive evaluation makes perfect sense under assumptions different from open-world. Our key observations are: (1) naive evaluation is equivalent to monotonicity of queries with respect to a semantics-induced ordering, and (2) for most reasonable semantics of incompleteness, such monotonicity is captured by preservation under various types of homomorphisms. Using these results we find classes of queries for which naive evaluation works, e.g., positive first-order formulae for the closed-world semantics. Even more, we introduce a general relation-based framework for defining semantics of incompleteness, show how it can be used to capture many known semantics and to introduce new ones, and describe classes of first-order queries for which naive evaluation works under such semantics
Descriptive Complexity of Deterministic Polylogarithmic Time and Space
We propose logical characterizations of problems solvable in deterministic
polylogarithmic time (PolylogTime) and polylogarithmic space (PolylogSpace). We
introduce a novel two-sorted logic that separates the elements of the input
domain from the bit positions needed to address these elements. We prove that
the inflationary and partial fixed point vartiants of this logic capture
PolylogTime and PolylogSpace, respectively. In the course of proving that our
logic indeed captures PolylogTime on finite ordered structures, we introduce a
variant of random-access Turing machines that can access the relations and
functions of a structure directly. We investigate whether an explicit predicate
for the ordering of the domain is needed in our PolylogTime logic. Finally, we
present the open problem of finding an exact characterization of
order-invariant queries in PolylogTime.Comment: Submitted to the Journal of Computer and System Science
Identifiers in Registers - Describing Network Algorithms with Logic
We propose a formal model of distributed computing based on register automata
that captures a broad class of synchronous network algorithms. The local memory
of each process is represented by a finite-state controller and a fixed number
of registers, each of which can store the unique identifier of some process in
the network. To underline the naturalness of our model, we show that it has the
same expressive power as a certain extension of first-order logic on graphs
whose nodes are equipped with a total order. Said extension lets us define new
functions on the set of nodes by means of a so-called partial fixpoint
operator. In spirit, our result bears close resemblance to a classical theorem
of descriptive complexity theory that characterizes the complexity class PSPACE
in terms of partial fixpoint logic (a proper superclass of the logic we
consider here).Comment: 17 pages (+ 17 pages of appendices), 1 figure (+ 1 figure in the
appendix
- âŠ