373 research outputs found
Tree Projections and Constraint Optimization Problems: Fixed-Parameter Tractability and Parallel Algorithms
Tree projections provide a unifying framework to deal with most structural
decomposition methods of constraint satisfaction problems (CSPs). Within this
framework, a CSP instance is decomposed into a number of sub-problems, called
views, whose solutions are either already available or can be computed
efficiently. The goal is to arrange portions of these views in a tree-like
structure, called tree projection, which determines an efficiently solvable CSP
instance equivalent to the original one. Deciding whether a tree projection
exists is NP-hard. Solution methods have therefore been proposed in the
literature that do not require a tree projection to be given, and that either
correctly decide whether the given CSP instance is satisfiable, or return that
a tree projection actually does not exist. These approaches had not been
generalized so far on CSP extensions for optimization problems, where the goal
is to compute a solution of maximum value/minimum cost. The paper fills the
gap, by exhibiting a fixed-parameter polynomial-time algorithm that either
disproves the existence of tree projections or computes an optimal solution,
with the parameter being the size of the expression of the objective function
to be optimized over all possible solutions (and not the size of the whole
constraint formula, used in related works). Tractability results are also
established for the problem of returning the best K solutions. Finally,
parallel algorithms for such optimization problems are proposed and analyzed.
Given that the classes of acyclic hypergraphs, hypergraphs of bounded
treewidth, and hypergraphs of bounded generalized hypertree width are all
covered as special cases of the tree projection framework, the results in this
paper directly apply to these classes. These classes are extensively considered
in the CSP setting, as well as in conjunctive database query evaluation and
optimization
Answering Conjunctive Queries under Updates
We consider the task of enumerating and counting answers to -ary
conjunctive queries against relational databases that may be updated by
inserting or deleting tuples. We exhibit a new notion of q-hierarchical
conjunctive queries and show that these can be maintained efficiently in the
following sense. During a linear time preprocessing phase, we can build a data
structure that enables constant delay enumeration of the query results; and
when the database is updated, we can update the data structure and restart the
enumeration phase within constant time. For the special case of self-join free
conjunctive queries we obtain a dichotomy: if a query is not q-hierarchical,
then query enumeration with sublinear delay and sublinear update time
(and arbitrary preprocessing time) is impossible.
For answering Boolean conjunctive queries and for the more general problem of
counting the number of solutions of k-ary queries we obtain complete
dichotomies: if the query's homomorphic core is q-hierarchical, then size of
the the query result can be computed in linear time and maintained with
constant update time. Otherwise, the size of the query result cannot be
maintained with sublinear update time. All our lower bounds rely on the
OMv-conjecture, a conjecture on the hardness of online matrix-vector
multiplication that has recently emerged in the field of fine-grained
complexity to characterise the hardness of dynamic problems. The lower bound
for the counting problem additionally relies on the orthogonal vectors
conjecture, which in turn is implied by the strong exponential time hypothesis.
By sublinear we mean for some
, where is the size of the active domain of the current
database
Ranked Enumeration of Conjunctive Query Results
We study the problem of enumerating answers of Conjunctive Queries ranked according to a given ranking function. Our main contribution is a novel algorithm with small preprocessing time, logarithmic delay, and non-trivial space usage during execution. To allow for efficient enumeration, we exploit certain properties of ranking functions that frequently occur in practice. To this end, we introduce the notions of decomposable and compatible (w.r.t. a query decomposition) ranking functions, which allow for partial aggregation of tuple scores in order to efficiently enumerate the output. We complement the algorithmic results with lower bounds that justify why restrictions on the structure of ranking functions are necessary. Our results extend and improve upon a long line of work that has studied ranked enumeration from both a theoretical and practical perspective
Provenance in Collaborative Data Sharing
This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant\u27s updates and then translates others\u27 updates to the participant\u27s local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases.
To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques
Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries
We study ranked enumeration of join-query results according to very general
orders defined by selective dioids. Our main contribution is a framework for
ranked enumeration over a class of dynamic programming problems that
generalizes seemingly different problems that had been studied in isolation. To
this end, we extend classic algorithms that find the k-shortest paths in a
weighted graph. For full conjunctive queries, including cyclic ones, our
approach is optimal in terms of the time to return the top result and the delay
between results. These optimality properties are derived for the widely used
notion of data complexity, which treats query size as a constant. By performing
a careful cost analysis, we are able to uncover a previously unknown tradeoff
between two incomparable enumeration approaches: one has lower complexity when
the number of returned results is small, the other when the number is very
large. We theoretically and empirically demonstrate the superiority of our
techniques over batch algorithms, which produce the full result and then sort
it. Our technique is not only faster for returning the first few results, but
on some inputs beats the batch algorithm even when all results are produced.Comment: 50 pages, 19 figure
- …