Search CORE

40 research outputs found

m-tables: Representing Missing Data

Author: Koutris Paraschos
Lang Willis
Naughton Jeffrey
Sundarmurthy Bruhathi
Tannen Val
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 20th International Conference on Database Theory (ICDT 2017)
Publication date: 01/01/2017
Field of study

Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible

Dagstuhl Research Online Publication Server

Answering UCQs under Updates and in the Presence of Integrity Constraints

Author: Berkholz Christoph
Keppeler Jens
Schweikardt Nicole
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Conference on Database Theory (ICDT 2018)
Publication date: 28/09/2017
Field of study

We investigate the query evaluation problem for fixed queries over fully dynamic databases where tuples can be inserted or deleted. The task is to design a dynamic data structure that can immediately report the new result of a fixed query after every database update. We consider unions of conjunctive queries (UCQs) and focus on the query evaluation tasks testing (decide whether an input tuple belongs to the query result), enumeration (enumerate, without repetition, all tuples in the query result), and counting (output the number of tuples in the query result). We identify three increasingly restrictive classes of UCQs which we call t-hierarchical, q-hierarchical, and exhaustively q-hierarchical UCQs. Our main results provide the following dichotomies: If the query\u27s homomorphic core is t-hierarchical (q-hierarchical, exhaustively q-hierarchical), then the testing (enumeration, counting) problem can be solved with constant update time and constant testing time (delay, counting time). Otherwise, it cannot be solved with sublinear update time and sublinear testing time (delay, counting time), unless the OV-conjecture and/or the OMv-conjecture fails. We also study the complexity of query evaluation in the dynamic setting in the presence of integrity constraints, and we obtain similar dichotomy results for the special case of small domain constraints (i.e., constraints which state that all values in a particular column of a relation belong to a fixed domain of constant size)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Tight Bounds for Graph Problems in Insertion Streams

Author: Sun Xiaoming
Woodruff David P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015)
Publication date: 01/01/2015
Field of study

Despite the large amount of work on solving graph problems in the data stream model, there do not exist tight space bounds for almost any of them, even in a stream with only edge insertions. For example, for testing connectivity, the upper bound is O(n * log(n)) bits, while the lower bound is only Omega(n) bits. We remedy this situation by providing the first tight Omega(n * log(n)) space lower bounds for randomized algorithms which succeed with constant probability in a stream of edge insertions for a number of graph problems. Our lower bounds apply to testing bipartiteness, connectivity, cycle-freeness, whether a graph is Eulerian, planarity, H-minor freeness, finding a minimum spanning tree of a connected graph, and testing if the diameter of a sparse graph is constant. We also give the first Omega(n * k * log(n)) space lower bounds for deterministic algorithms for k-edge connectivity and k-vertex connectivity; these are optimal in light of known deterministic upper bounds (for k-vertex connectivity we also need to allow edge duplications, which known upper bounds allow). Finally, we give an Omega(n * log^2(n)) lower bound for randomized algorithms approximating the minimum cut up to a constant factor with constant probability in a graph with integer weights between 1 and n, presented as a stream of insertions and deletions to its edges. This lower bound also holds for cut sparsifiers, and gives the first separation of maintaining a sparsifier in the data stream model versus the offline model

Dagstuhl Research Online Publication Server

Streaming Algorithms with Large Approximation Factors

Author: Li Yi
Lin Honghao
Woodruff David P.
Zhang Yuheng
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

An Asymptotically Optimal Algorithm for Maximum Matching in Dynamic Streams

Author: Assadi Sepehr
Shah Vihan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 01/01/2022
Field of study

We present an algorithm for the maximum matching problem in dynamic (insertion-deletions) streams with *asymptotically optimal* space complexity: for any

n

-vertex graph, our algorithm with high probability outputs an

\alpha

-approximate matching in a single pass using

O(n^2/\alpha^3)

bits of space. A long line of work on the dynamic streaming matching problem has reduced the gap between space upper and lower bounds first to

n^{o(1)}

factors [Assadi-Khanna-Li-Yaroslavtsev; SODA 2016] and subsequently to

\text{polylog}{(n)}

factors [Dark-Konrad; CCC 2020]. Our upper bound now matches the Dark-Konrad lower bound up to

O(1)

factors, thus completing this research direction. Our approach consists of two main steps: we first (provably) identify a family of graphs, similar to the instances used in prior work to establish the lower bounds for this problem, as the only "hard" instances to focus on. These graphs include an induced subgraph which is both sparse and contains a large matching. We then design a dynamic streaming algorithm for this family of graphs which is more efficient than prior work. The key to this efficiency is a novel sketching method, which bypasses the typical loss of

\text{polylog}{(n)}

-factors in space compared to standard

L_0

-sampling primitives, and can be of independent interest in designing optimal algorithms for other streaming problems.Comment: Full version of the paper accepted to ITCS 2022. 42 pages, 5 Figure

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

FAQ

Author: Börger E.
Chen H.
Freuder E. C.
Gyssens M.
Koller D.
Ordyniak S.
Pearl J.
Rollon E.
Rossi F.
Veldhuizen T. L.
Yannakakis M.
Zhang N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Weighted Maximum Independent Set of Geometric Objects in Turnstile Streams

Author: Bakshi Ainesh
Chepurko Nadiia
Woodruff David P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)
Publication date: 01/01/2020
Field of study

We study the Maximum Independent Set problem for geometric objects given in the data stream model. A set of geometric objects is said to be independent if the objects are pairwise disjoint. We consider geometric objects in one and two dimensions, i.e., intervals and disks. Let

\alpha

be the cardinality of the largest independent set. Our goal is to estimate

\alpha

in a small amount of space, given that the input is received as a one-pass stream. We also consider a generalization of this problem by assigning weights to each object and estimating

\beta

, the largest value of a weighted independent set. We initialize the study of this problem in the turnstile streaming model (insertions and deletions) and provide the first algorithms for estimating

\alpha

and

\beta

. For unit-length intervals, we obtain a

(2+\epsilon)

-approximation to

\alpha

and

\beta

in poly

(\frac{\log(n)}{\epsilon})

space. We also show a matching lower bound. Combined with the

3/2

-approximation for insertion-only streams by Cabello and Perez-Lanterno [CP15], our result implies a separation between the insertion-only and turnstile model. For unit-radius disks, we obtain a

\left(\frac{8\sqrt{3}}{\pi}\right)

-approximation to

\alpha

and

\beta

in poly

(\log(n), \epsilon^{-1})

space, which is closely related to the hexagonal circle packing constant. We provide algorithms for estimating

\alpha

for arbitrary-length intervals under a bounded intersection assumption and study the parameterized space complexity of estimating

\alpha

and

\beta

, where the parameter is the ratio of maximum to minimum interval length.Comment: The lower bound for arbitrary length intervals in the previous version contains a bug, we are updating the submission to reflect thi

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server