41 research outputs found

### A Dichotomy on the Complexity of Consistent Query Answering for Atoms with Simple Keys

We study the problem of consistent query answering under primary key
violations. In this setting, the relations in a database violate the key
constraints and we are interested in maximal subsets of the database that
satisfy the constraints, which we call repairs. For a boolean query Q, the
problem CERTAINTY(Q) asks whether every such repair satisfies the query or not;
the problem is known to be always in coNP for conjunctive queries. However,
there are queries for which it can be solved in polynomial time. It has been
conjectured that there exists a dichotomy on the complexity of CERTAINTY(Q) for
conjunctive queries: it is either in PTIME or coNP-complete. In this paper, we
prove that the conjecture is indeed true for the case of conjunctive queries
without self-joins, where each atom has as a key either a single attribute
(simple key) or all attributes of the atom

### The Design of Arbitrage-Free Data Pricing Schemes

Motivated by a growing market that involves buying and selling data over the
web, we study pricing schemes that assign value to queries issued over a
database. Previous work studied pricing mechanisms that compute the price of a
query by extending a data seller's explicit prices on certain queries, or
investigated the properties that a pricing function should exhibit without
detailing a generic construction. In this work, we present a formal framework
for pricing queries over data that allows the construction of general families
of pricing functions, with the main goal of avoiding arbitrage. We consider two
types of pricing schemes: instance-independent schemes, where the price depends
only on the structure of the query, and answer-dependent schemes, where the
price also depends on the query output. Our main result is a complete
characterization of the structure of pricing functions in both settings, by
relating it to properties of a function over a lattice. We use our
characterization, together with information-theoretic methods, to construct a
variety of arbitrage-free pricing functions. Finally, we discuss various
tradeoffs in the design space and present techniques for efficient computation
of the proposed pricing functions.Comment: full pape

### Communication Steps for Parallel Query Processing

We consider the problem of computing a relational query $q$ on a large input
database of size $n$, using a large number $p$ of servers. The computation is
performed in rounds, and each server can receive only $O(n/p^{1-\varepsilon})$
bits of data, where $\varepsilon \in [0,1]$ is a parameter that controls
replication. We examine how many global communication steps are needed to
compute $q$. We establish both lower and upper bounds, in two settings. For a
single round of communication, we give lower bounds in the strongest possible
model, where arbitrary bits may be exchanged; we show that any algorithm
requires $\varepsilon \geq 1-1/\tau^*$, where $\tau^*$ is the fractional vertex
cover of the hypergraph of $q$. We also give an algorithm that matches the
lower bound for a specific class of databases. For multiple rounds of
communication, we present lower bounds in a model where routing decisions for a
tuple are tuple-based. We show that for the class of tree-like queries there
exists a tradeoff between the number of rounds and the space exponent
$\varepsilon$. The lower bounds for multiple rounds are the first of their
kind. Our results also imply that transitive closure cannot be computed in O(1)
rounds of communication

### Worst-Case Optimal Algorithms for Parallel Query Processing

In this paper, we study the communication complexity for the problem of
computing a conjunctive query on a large database in a parallel setting with
$p$ servers. In contrast to previous work, where upper and lower bounds on the
communication were specified for particular structures of data (either data
without skew, or data with specific types of skew), in this work we focus on
worst-case analysis of the communication cost. The goal is to find worst-case
optimal parallel algorithms, similar to the work of [18] for sequential
algorithms.
We first show that for a single round we can obtain an optimal worst-case
algorithm. The optimal load for a conjunctive query $q$ when all relations have
size equal to $M$ is $O(M/p^{1/\psi^*})$, where $\psi^*$ is a new query-related
quantity called the edge quasi-packing number, which is different from both the
edge packing number and edge cover number of the query hypergraph. For multiple
rounds, we present algorithms that are optimal for several classes of queries.
Finally, we show a surprising connection to the external memory model, which
allows us to translate parallel algorithms to external memory algorithms. This
technique allows us to recover (within a polylogarithmic factor) several recent
results on the I/O complexity for computing join queries, and also obtain
optimal algorithms for other classes of queries

### The Fine-Grained Complexity of CFL Reachability

Many problems in static program analysis can be modeled as the context-free
language (CFL) reachability problem on directed labeled graphs. The CFL
reachability problem can be generally solved in time $O(n^3)$, where $n$ is the
number of vertices in the graph, with some specific cases that can be solved
faster. In this work, we ask the following question: given a specific CFL, what
is the exact exponent in the monomial of the running time? In other words, for
which cases do we have linear, quadratic or cubic algorithms, and are there
problems with intermediate runtimes? This question is inspired by recent
efforts to classify classic problems in terms of their exact polynomial
complexity, known as {\em fine-grained complexity}. Although recent efforts
have shown some conditional lower bounds (mostly for the class of combinatorial
algorithms), a general picture of the fine-grained complexity landscape for CFL
reachability is missing.
Our main contribution is lower bound results that pinpoint the exact running
time of several classes of CFLs or specific CFLs under widely believed lower
bound conjectures (Boolean Matrix Multiplication and $k$-Clique). We
particularly focus on the family of Dyck-$k$ languages (which are strings with
well-matched parentheses), a fundamental class of CFL reachability problems. We
present new lower bounds for the case of sparse input graphs where the number
of edges $m$ is the input parameter, a common setting in the database
literature. For this setting, we show a cubic lower bound for Andersen's
Pointer Analysis which significantly strengthens prior known results.Comment: Appeared in POPL 2023. Please note the erratum on the first pag