4,087 research outputs found
Recommended from our members
Structure identification in relational data
This paper presents several investigations into the prospects for identifying meaningful structures in empirical data, namely, structures permitting effective organization of the data to meet requirements of future queries. We propose a general framework whereby the notion of identifiability is given a precise formal definition similar to that of learnability. Using this framework, we then explore if a tractable procedure exists for deciding whether a given relation is decomposable into a constraint network or a CNF theory with desirable topology and, if the answer is positive, identifying the desired decomposition. Finally, we address the problem of expressing a given relation as a Horn theory and, if this is impossible, finding the best k-Horn approximation to the given relation. We show that both problems can be solved in time polynomial in the length of the data
Querying the Guarded Fragment
Evaluating a Boolean conjunctive query Q against a guarded first-order theory
F is equivalent to checking whether "F and not Q" is unsatisfiable. This
problem is relevant to the areas of database theory and description logic.
Since Q may not be guarded, well known results about the decidability,
complexity, and finite-model property of the guarded fragment do not obviously
carry over to conjunctive query answering over guarded theories, and had been
left open in general. By investigating finite guarded bisimilar covers of
hypergraphs and relational structures, and by substantially generalising
Rosati's finite chase, we prove for guarded theories F and (unions of)
conjunctive queries Q that (i) Q is true in each model of F iff Q is true in
each finite model of F and (ii) determining whether F implies Q is
2EXPTIME-complete. We further show the following results: (iii) the existence
of polynomial-size conformal covers of arbitrary hypergraphs; (iv) a new proof
of the finite model property of the clique-guarded fragment; (v) the small
model property of the guarded fragment with optimal bounds; (vi) a
polynomial-time solution to the canonisation problem modulo guarded
bisimulation, which yields (vii) a capturing result for guarded bisimulation
invariant PTIME.Comment: This is an improved and extended version of the paper of the same
title presented at LICS 201
Optimal Joins Using Compact Data Structures
Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality we either need to build completely new indexes, or we must populate the database with several instantiations of indexes such as B+-trees. Either way, this means spending an extra amount of storage space that may be non-negligible.
We show that optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of extra storage. Our representation is a compact quadtree for the static indexes, and a dynamic quadtree sharing subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, and show that the running time of this algorithm is worst-case optimal in data complexity. Remarkably, we can extend our framework to evaluate more expressive queries from relational algebra by introducing a lazy version of qdags (lqdags). Once again, we can show that the running time of our algorithms is worst-case optimal
The complexity of acyclic conjunctive queries revisited
In this paper, we consider first-order logic over unary functions and study
the complexity of the evaluation problem for conjunctive queries described by
such kind of formulas. A natural notion of query acyclicity for this language
is introduced and we study the complexity of a large number of variants or
generalizations of acyclic query problems in that context (Boolean or not
Boolean, with or without inequalities, comparisons, etc...). Our main results
show that all those problems are \textit{fixed-parameter linear} i.e. they can
be evaluated in time where is the
size of the query , the database size, is
the size of the output and is some function whose value depends on the
specific variant of the query problem (in some cases, is the identity
function). Our results have two kinds of consequences. First, they can be
easily translated in the relational (i.e., classical) setting. Previously known
bounds for some query problems are improved and new tractable cases are then
exhibited. Among others, as an immediate corollary, we improve a result of
\~\cite{PapadimitriouY-99} by showing that any (relational) acyclic conjunctive
query with inequalities can be evaluated in time
. A second consequence of our method is
that it provides a very natural descriptive approach to the complexity of
well-known algorithmic problems. A number of examples (such as acyclic subgraph
problems, multidimensional matching, etc...) are considered for which new
insights of their complexity are given.Comment: 30 page
Communication Steps for Parallel Query Processing
We consider the problem of computing a relational query on a large input
database of size , using a large number of servers. The computation is
performed in rounds, and each server can receive only
bits of data, where is a parameter that controls
replication. We examine how many global communication steps are needed to
compute . We establish both lower and upper bounds, in two settings. For a
single round of communication, we give lower bounds in the strongest possible
model, where arbitrary bits may be exchanged; we show that any algorithm
requires , where is the fractional vertex
cover of the hypergraph of . We also give an algorithm that matches the
lower bound for a specific class of databases. For multiple rounds of
communication, we present lower bounds in a model where routing decisions for a
tuple are tuple-based. We show that for the class of tree-like queries there
exists a tradeoff between the number of rounds and the space exponent
. The lower bounds for multiple rounds are the first of their
kind. Our results also imply that transitive closure cannot be computed in O(1)
rounds of communication
- …