40 research outputs found

    m-tables: Representing Missing Data

    Get PDF
    Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible

    Answering UCQs under Updates and in the Presence of Integrity Constraints

    Get PDF
    We investigate the query evaluation problem for fixed queries over fully dynamic databases where tuples can be inserted or deleted. The task is to design a dynamic data structure that can immediately report the new result of a fixed query after every database update. We consider unions of conjunctive queries (UCQs) and focus on the query evaluation tasks testing (decide whether an input tuple belongs to the query result), enumeration (enumerate, without repetition, all tuples in the query result), and counting (output the number of tuples in the query result). We identify three increasingly restrictive classes of UCQs which we call t-hierarchical, q-hierarchical, and exhaustively q-hierarchical UCQs. Our main results provide the following dichotomies: If the query\u27s homomorphic core is t-hierarchical (q-hierarchical, exhaustively q-hierarchical), then the testing (enumeration, counting) problem can be solved with constant update time and constant testing time (delay, counting time). Otherwise, it cannot be solved with sublinear update time and sublinear testing time (delay, counting time), unless the OV-conjecture and/or the OMv-conjecture fails. We also study the complexity of query evaluation in the dynamic setting in the presence of integrity constraints, and we obtain similar dichotomy results for the special case of small domain constraints (i.e., constraints which state that all values in a particular column of a relation belong to a fixed domain of constant size)

    Tight Bounds for Graph Problems in Insertion Streams

    Get PDF
    Despite the large amount of work on solving graph problems in the data stream model, there do not exist tight space bounds for almost any of them, even in a stream with only edge insertions. For example, for testing connectivity, the upper bound is O(n * log(n)) bits, while the lower bound is only Omega(n) bits. We remedy this situation by providing the first tight Omega(n * log(n)) space lower bounds for randomized algorithms which succeed with constant probability in a stream of edge insertions for a number of graph problems. Our lower bounds apply to testing bipartiteness, connectivity, cycle-freeness, whether a graph is Eulerian, planarity, H-minor freeness, finding a minimum spanning tree of a connected graph, and testing if the diameter of a sparse graph is constant. We also give the first Omega(n * k * log(n)) space lower bounds for deterministic algorithms for k-edge connectivity and k-vertex connectivity; these are optimal in light of known deterministic upper bounds (for k-vertex connectivity we also need to allow edge duplications, which known upper bounds allow). Finally, we give an Omega(n * log^2(n)) lower bound for randomized algorithms approximating the minimum cut up to a constant factor with constant probability in a graph with integer weights between 1 and n, presented as a stream of insertions and deletions to its edges. This lower bound also holds for cut sparsifiers, and gives the first separation of maintaining a sparsifier in the data stream model versus the offline model

    Streaming Algorithms with Large Approximation Factors

    Get PDF

    An Asymptotically Optimal Algorithm for Maximum Matching in Dynamic Streams

    Get PDF
    We present an algorithm for the maximum matching problem in dynamic (insertion-deletions) streams with *asymptotically optimal* space complexity: for any nn-vertex graph, our algorithm with high probability outputs an α\alpha-approximate matching in a single pass using O(n2/α3)O(n^2/\alpha^3) bits of space. A long line of work on the dynamic streaming matching problem has reduced the gap between space upper and lower bounds first to no(1)n^{o(1)} factors [Assadi-Khanna-Li-Yaroslavtsev; SODA 2016] and subsequently to polylog(n)\text{polylog}{(n)} factors [Dark-Konrad; CCC 2020]. Our upper bound now matches the Dark-Konrad lower bound up to O(1)O(1) factors, thus completing this research direction. Our approach consists of two main steps: we first (provably) identify a family of graphs, similar to the instances used in prior work to establish the lower bounds for this problem, as the only "hard" instances to focus on. These graphs include an induced subgraph which is both sparse and contains a large matching. We then design a dynamic streaming algorithm for this family of graphs which is more efficient than prior work. The key to this efficiency is a novel sketching method, which bypasses the typical loss of polylog(n)\text{polylog}{(n)}-factors in space compared to standard L0L_0-sampling primitives, and can be of independent interest in designing optimal algorithms for other streaming problems.Comment: Full version of the paper accepted to ITCS 2022. 42 pages, 5 Figure

    FAQ

    Full text link

    Weighted Maximum Independent Set of Geometric Objects in Turnstile Streams

    Get PDF
    We study the Maximum Independent Set problem for geometric objects given in the data stream model. A set of geometric objects is said to be independent if the objects are pairwise disjoint. We consider geometric objects in one and two dimensions, i.e., intervals and disks. Let α\alpha be the cardinality of the largest independent set. Our goal is to estimate α\alpha in a small amount of space, given that the input is received as a one-pass stream. We also consider a generalization of this problem by assigning weights to each object and estimating β\beta, the largest value of a weighted independent set. We initialize the study of this problem in the turnstile streaming model (insertions and deletions) and provide the first algorithms for estimating α\alpha and β\beta. For unit-length intervals, we obtain a (2+ϵ)(2+\epsilon)-approximation to α\alpha and β\beta in poly(log(n)ϵ)(\frac{\log(n)}{\epsilon}) space. We also show a matching lower bound. Combined with the 3/23/2-approximation for insertion-only streams by Cabello and Perez-Lanterno [CP15], our result implies a separation between the insertion-only and turnstile model. For unit-radius disks, we obtain a (83π)\left(\frac{8\sqrt{3}}{\pi}\right)-approximation to α\alpha and β\beta in poly(log(n),ϵ1)(\log(n), \epsilon^{-1}) space, which is closely related to the hexagonal circle packing constant. We provide algorithms for estimating α\alpha for arbitrary-length intervals under a bounded intersection assumption and study the parameterized space complexity of estimating α\alpha and β\beta, where the parameter is the ratio of maximum to minimum interval length.Comment: The lower bound for arbitrary length intervals in the previous version contains a bug, we are updating the submission to reflect thi
    corecore