10 research outputs found
A Negative Conjunctive Query is Easy if and only if it is Beta-Acyclic
It is known that the data complexity of a Conjunctive Query (CQ) is determined only by the way its variables are shared between atoms, reflected by its hypergraph. In particular, Yannakakis [18, 3] proved that a CQ is decidable in linear time when it is α-acyclic, i.e. its hypergraph is α-acyclic; Bagan et al. [2] even state: Any CQ is decidable in linear time iff it is α-acyclic. (under certain hypotheses) (By linear time, we mean a query on a structure S can be decided in time O(|S|)) A natural question is: since the complexity of a Negative Conjunctive Query (NCQ), a conjunctive query where all atoms are negated, also only depends on its hypergraph, can we find a similar dichotomy in this case? To answer this question, we revisit a result of Ordyniak et al. [17] — that states that satisfiability of a β-acyclic CNF formula is decidable in polynomial time — by proving that some part of their procedure can be done in linear time. This implies, under an algorithmic hypothesis (precisely: one cannot decide whether a graph is triangle-free in time O(n 2 log n) where n is the number of vertices.) that is likely true: Any NCQ is decidable in quasi-linear time iff it is β-acyclic. (By quasi-linear time, we mean a query on a structure S can be decided in time O(|S | log |S|)) We extend the easiness result to Signed Conjunctive Query (SCQ) where some atoms are negated. This has great interest since using some negated atoms is natural in the frameworks of databases and CSP. Furthermore, it implies straightforwardly the following: Any β-acyclic existential first-order query is decidable in quasi-linear time
Hypergraph Acyclicity and Propositional Model Counting
We show that the propositional model counting problem #SAT for CNF- formulas
with hypergraphs that allow a disjoint branches decomposition can be solved in
polynomial time. We show that this class of hypergraphs is incomparable to
hypergraphs of bounded incidence cliquewidth which were the biggest class of
hypergraphs for which #SAT was known to be solvable in polynomial time so far.
Furthermore, we present a polynomial time algorithm that computes a disjoint
branches decomposition of a given hypergraph if it exists and rejects
otherwise. Finally, we show that some slight extensions of the class of
hypergraphs with disjoint branches decompositions lead to intractable #SAT,
leaving open how to generalize the counting result of this paper
Compressed Representations of Conjunctive Query Results
Relational queries, and in particular join queries, often generate large
output results when executed over a huge dataset. In such cases, it is often
infeasible to store the whole materialized output if we plan to reuse it
further down a data processing pipeline. Motivated by this problem, we study
the construction of space-efficient compressed representations of the output of
conjunctive queries, with the goal of supporting the efficient access of the
intermediate compressed result for a given access pattern. In particular, we
initiate the study of an important tradeoff: minimizing the space necessary to
store the compressed result, versus minimizing the answer time and delay for an
access request over the result. Our main contribution is a novel parameterized
data structure, which can be tuned to trade off space for answer time. The
tradeoff allows us to control the space requirement of the data structure
precisely, and depends both on the structure of the query and the access
pattern. We show how we can use the data structure in conjunction with query
decomposition techniques, in order to efficiently represent the outputs for
several classes of conjunctive queries.Comment: To appear in PODS'18; 35 pages; comments welcom
Enumerating Answers to First-Order Queries over Databases of Low Degree
A class of relational databases has low degree if for all , all but
finitely many databases in the class have degree at most , where
is the size of the database. Typical examples are databases of bounded
degree or of degree bounded by .
It is known that over a class of databases having low degree, first-order
boolean queries can be checked in pseudo-linear time, i.e.\ for all
in time bounded by . We generalize this result by
considering query evaluation.
We show that counting the number of answers to a query can be done in
pseudo-linear time and that after a pseudo-linear time preprocessing we can
test in constant time whether a given tuple is a solution to a query or
enumerate the answers to a query with constant delay
Beyond Worst-Case Analysis for Joins with Minesweeper
We describe a new algorithm, Minesweeper, that is able to satisfy stronger
runtime guarantees than previous join algorithms (colloquially, `beyond
worst-case guarantees') for data in indexed search trees. Our first
contribution is developing a framework to measure this stronger notion of
complexity, which we call {\it certificate complexity}, that extends notions of
Barbay et al. and Demaine et al.; a certificate is a set of propositional
formulae that certifies that the output is correct. This notion captures a
natural class of join algorithms. In addition, the certificate allows us to
define a strictly stronger notion of runtime complexity than traditional
worst-case guarantees. Our second contribution is to develop a dichotomy
theorem for the certificate-based notion of complexity. Roughly, we show that
Minesweeper evaluates -acyclic queries in time linear in the certificate
plus the output size, while for any -cyclic query there is some instance
that takes superlinear time in the certificate (and for which the output is no
larger than the certificate size). We also extend our certificate-complexity
analysis to queries with bounded treewidth and the triangle query.Comment: [This is the full version of our PODS'2014 paper.
Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries
We investigate trade-offs in static and dynamic evaluation of hierarchical
queries with arbitrary free variables. In the static setting, the trade-off is
between the time to partially compute the query result and the delay needed to
enumerate its tuples. In the dynamic setting, we additionally consider the time
needed to update the query result in the presence of single-tuple inserts and
deletes to the input database.
Our approach observes the degree of values in the database and uses different
computation and maintenance strategies for high-degree and low-degree values.
For the latter it partially computes the result, while for the former it
computes enough information to allow for on-the-fly enumeration.
The main result of this work defines the preprocessing time, the update time,
and the enumeration delay as functions of the light/heavy threshold and of the
factorization width of the hierarchical query. By conveniently choosing this
threshold, our approach can recover a number of prior results when restricted
to hierarchical queries.
For a restricted class of hierarchical queries, our approach can achieve
worst-case optimal update time and enumeration delay conditioned on the Online
Matrix-Vector Multiplication Conjecture.Comment: Technical Report; 52 pages. The updated version contains: new
diagrams and plots summarizing known results and putting the results of the
paper into context; introduction of delta_i-hieararchical queries, for any
non-negative integer i; optimality results for delta_0- and
delta_1-hieararchical querie
Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries
We investigate trade-offs in static and dynamic evaluation of hierarchical
queries with arbitrary free variables. In the static setting, the trade-off is
between the time to partially compute the query result and the delay needed to
enumerate its tuples. In the dynamic setting, we additionally consider the time
needed to update the query result under single-tuple inserts or deletes to the
database.
Our approach observes the degree of values in the database and uses different
computation and maintenance strategies for high-degree (heavy) and low-degree
(light) values. For the latter it partially computes the result, while for the
former it computes enough information to allow for on-the-fly enumeration.
We define the preprocessing time, the update time, and the enumeration delay
as functions of the light/heavy threshold. By appropriately choosing this
threshold, our approach recovers a number of prior results when restricted to
hierarchical queries.
We show that for a restricted class of hierarchical queries, our approach
achieves worst-case optimal update time and enumeration delay conditioned on
the Online Matrix-Vector Multiplication Conjecture