40 research outputs found
m-tables: Representing Missing Data
Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible
Answering UCQs under Updates and in the Presence of Integrity Constraints
We investigate the query evaluation problem for fixed queries over
fully dynamic databases where tuples can be inserted or deleted.
The task is to design a dynamic data structure that can immediately
report the new result of a fixed query after every database update.
We consider unions of conjunctive queries (UCQs) and focus on the query evaluation tasks testing (decide whether an input tuple belongs to the query result), enumeration (enumerate, without repetition,
all tuples in the query result), and counting (output the number of tuples in the query result).
We identify three increasingly restrictive classes of UCQs which we
call t-hierarchical, q-hierarchical, and exhaustively q-hierarchical UCQs.
Our main results provide the following dichotomies:
If the query\u27s homomorphic core is t-hierarchical (q-hierarchical,
exhaustively q-hierarchical), then the testing (enumeration, counting)
problem can be solved with constant update time and constant testing time (delay, counting time). Otherwise, it cannot be solved with sublinear update time and sublinear testing time (delay, counting time), unless the OV-conjecture and/or the OMv-conjecture fails.
We also study the complexity of query evaluation in the dynamic setting in the presence of integrity constraints, and we obtain similar dichotomy results for the special case of small domain constraints (i.e., constraints which state that
all values in a particular column of a relation belong to a fixed domain of constant size)
Tight Bounds for Graph Problems in Insertion Streams
Despite the large amount of work on solving graph problems in the data stream model, there do not exist tight space bounds for almost any of them, even in a stream with only edge insertions. For example, for testing connectivity, the upper bound is O(n * log(n)) bits, while the lower bound is only Omega(n) bits. We remedy this situation by providing the first tight Omega(n * log(n)) space lower bounds for randomized algorithms which succeed with constant probability in a stream of edge insertions for a number of graph problems. Our lower bounds apply to testing bipartiteness, connectivity, cycle-freeness, whether a graph is Eulerian, planarity, H-minor freeness, finding a minimum spanning tree of a connected graph, and testing if the diameter of a sparse graph is constant. We also give the first Omega(n * k * log(n)) space lower bounds for deterministic algorithms for k-edge connectivity and k-vertex connectivity; these are optimal in light of known deterministic upper bounds (for k-vertex connectivity we also need to allow edge duplications, which known upper bounds allow). Finally, we give an Omega(n * log^2(n)) lower bound for randomized algorithms approximating the minimum cut up to a constant factor with constant probability in a graph with integer weights between 1 and n, presented as a stream of insertions and deletions to its edges. This lower bound also holds for cut sparsifiers, and gives the first separation of maintaining a sparsifier in the data stream model versus the offline model
An Asymptotically Optimal Algorithm for Maximum Matching in Dynamic Streams
We present an algorithm for the maximum matching problem in dynamic
(insertion-deletions) streams with *asymptotically optimal* space complexity:
for any -vertex graph, our algorithm with high probability outputs an
-approximate matching in a single pass using bits of
space.
A long line of work on the dynamic streaming matching problem has reduced the
gap between space upper and lower bounds first to factors
[Assadi-Khanna-Li-Yaroslavtsev; SODA 2016] and subsequently to
factors [Dark-Konrad; CCC 2020]. Our upper bound now
matches the Dark-Konrad lower bound up to factors, thus completing this
research direction.
Our approach consists of two main steps: we first (provably) identify a
family of graphs, similar to the instances used in prior work to establish the
lower bounds for this problem, as the only "hard" instances to focus on. These
graphs include an induced subgraph which is both sparse and contains a large
matching. We then design a dynamic streaming algorithm for this family of
graphs which is more efficient than prior work. The key to this efficiency is a
novel sketching method, which bypasses the typical loss of
-factors in space compared to standard -sampling
primitives, and can be of independent interest in designing optimal algorithms
for other streaming problems.Comment: Full version of the paper accepted to ITCS 2022. 42 pages, 5 Figure
Weighted Maximum Independent Set of Geometric Objects in Turnstile Streams
We study the Maximum Independent Set problem for geometric objects given in
the data stream model. A set of geometric objects is said to be independent if
the objects are pairwise disjoint. We consider geometric objects in one and two
dimensions, i.e., intervals and disks. Let be the cardinality of the
largest independent set. Our goal is to estimate in a small amount of
space, given that the input is received as a one-pass stream. We also consider
a generalization of this problem by assigning weights to each object and
estimating , the largest value of a weighted independent set. We
initialize the study of this problem in the turnstile streaming model
(insertions and deletions) and provide the first algorithms for estimating
and .
For unit-length intervals, we obtain a -approximation to
and in poly space. We also show a
matching lower bound. Combined with the -approximation for insertion-only
streams by Cabello and Perez-Lanterno [CP15], our result implies a separation
between the insertion-only and turnstile model. For unit-radius disks, we
obtain a -approximation to and
in poly space, which is closely related to
the hexagonal circle packing constant.
We provide algorithms for estimating for arbitrary-length intervals
under a bounded intersection assumption and study the parameterized space
complexity of estimating and , where the parameter is the ratio
of maximum to minimum interval length.Comment: The lower bound for arbitrary length intervals in the previous
version contains a bug, we are updating the submission to reflect thi