18 research outputs found
Enumeration Complexity of Conjunctive Queries with Functional Dependencies
We study the complexity of enumerating the answers of Conjunctive Queries (CQs) in the presence of Functional Dependencies (FDs). Our focus is on the ability to list output tuples with a constant delay in between, following a linear-time preprocessing. A known dichotomy classifies the acyclic self-join-free CQs into those that admit such enumeration, and those that do not. However, this classification no longer holds in the common case where the database exhibits dependencies among attributes. That is, some queries that are classified as hard are in fact tractable if dependencies are accounted for. We establish a generalization of the dichotomy to accommodate FDs; hence, our classification determines which combination of a CQ and a set of FDs admits constant-delay enumeration with a linear-time preprocessing.
In addition, we generalize a hardness result for cyclic CQs to accommodate a common type of FDs. Further conclusions of our development include a dichotomy for enumeration with linear delay, and a dichotomy for CQs with disequalities. Finally, we show that all our results apply to the known class of "cardinality dependencies" that generalize FDs (e.g., by stating an upper bound on the number of genres per movies, or friends per person)
Database Repairing with Soft Functional Dependencies
A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost), this subset is a "cardinality repair" of an inconsistent database; in soft interpretations, this subset corresponds to a "most probable world" of a probabilistic database, a "most likely intention" of a probabilistic unclean database, and so on. Within the class of functional dependencies, the complexity of finding a cardinality repair is thoroughly understood. Yet, very little is known about the complexity of finding an optimal subset for the more general soft semantics. This paper makes a significant progress in this direction. In addition to general insights about the hardness and approximability of the problem, we present algorithms for two special cases: a single functional dependency, and a bipartite matching. The latter is the problem of finding an optimal "almost matching" of a bipartite graph where a penalty is paid for every lost edge and every violation of monogamy
Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries
We study the question of when we can provide logarithmic-time direct access
to the k-th answer to a Conjunctive Query (CQ) with a specified ordering over
the answers, following a preprocessing step that constructs a data structure in
time quasilinear in the size of the database. Specifically, we embark on the
challenge of identifying the tractable answer orderings that allow for ranked
direct access with such complexity guarantees. We begin with lexicographic
orderings and give a decidable characterization (under conventional complexity
assumptions) of the class of tractable lexicographic orderings for every CQ
without self-joins. We then continue to the more general orderings by the sum
of attribute weights and show for it that ranked direct access is tractable
only in trivial cases. Hence, to better understand the computational challenge
at hand, we consider the more modest task of providing access to only a single
answer (i.e., finding the answer at a given position) - a task that we refer to
as the selection problem. We indeed achieve a quasilinear-time algorithm for a
subset of the class of full CQs without self-joins, by adopting a solution of
Frederickson and Johnson to the classic problem of selection over sorted
matrices. We further prove that none of the other queries in this class admit
such an algorithm.Comment: 17 page
Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries
We investigate trade-offs in static and dynamic evaluation of hierarchical
queries with arbitrary free variables. In the static setting, the trade-off is
between the time to partially compute the query result and the delay needed to
enumerate its tuples. In the dynamic setting, we additionally consider the time
needed to update the query result in the presence of single-tuple inserts and
deletes to the input database.
Our approach observes the degree of values in the database and uses different
computation and maintenance strategies for high-degree and low-degree values.
For the latter it partially computes the result, while for the former it
computes enough information to allow for on-the-fly enumeration.
The main result of this work defines the preprocessing time, the update time,
and the enumeration delay as functions of the light/heavy threshold and of the
factorization width of the hierarchical query. By conveniently choosing this
threshold, our approach can recover a number of prior results when restricted
to hierarchical queries.
For a restricted class of hierarchical queries, our approach can achieve
worst-case optimal update time and enumeration delay conditioned on the Online
Matrix-Vector Multiplication Conjecture.Comment: Technical Report; 52 pages. The updated version contains: new
diagrams and plots summarizing known results and putting the results of the
paper into context; introduction of delta_i-hieararchical queries, for any
non-negative integer i; optimality results for delta_0- and
delta_1-hieararchical querie