101 research outputs found

    Weighted Maximum Independent Set of Geometric Objects in Turnstile Streams

    Get PDF
    We study the Maximum Independent Set problem for geometric objects given in the data stream model. A set of geometric objects is said to be independent if the objects are pairwise disjoint. We consider geometric objects in one and two dimensions, i.e., intervals and disks. Let α\alpha be the cardinality of the largest independent set. Our goal is to estimate α\alpha in a small amount of space, given that the input is received as a one-pass stream. We also consider a generalization of this problem by assigning weights to each object and estimating β\beta, the largest value of a weighted independent set. We initialize the study of this problem in the turnstile streaming model (insertions and deletions) and provide the first algorithms for estimating α\alpha and β\beta. For unit-length intervals, we obtain a (2+ϵ)(2+\epsilon)-approximation to α\alpha and β\beta in poly(log(n)ϵ)(\frac{\log(n)}{\epsilon}) space. We also show a matching lower bound. Combined with the 3/23/2-approximation for insertion-only streams by Cabello and Perez-Lanterno [CP15], our result implies a separation between the insertion-only and turnstile model. For unit-radius disks, we obtain a (83π)\left(\frac{8\sqrt{3}}{\pi}\right)-approximation to α\alpha and β\beta in poly(log(n),ϵ1)(\log(n), \epsilon^{-1}) space, which is closely related to the hexagonal circle packing constant. We provide algorithms for estimating α\alpha for arbitrary-length intervals under a bounded intersection assumption and study the parameterized space complexity of estimating α\alpha and β\beta, where the parameter is the ratio of maximum to minimum interval length.Comment: The lower bound for arbitrary length intervals in the previous version contains a bug, we are updating the submission to reflect thi

    Optimal lower bounds for universal relation, and for samplers and finding duplicates in streams

    Full text link
    In the communication problem UR\mathbf{UR} (universal relation) [KRW95], Alice and Bob respectively receive x,y{0,1}nx, y \in\{0,1\}^n with the promise that xyx\neq y. The last player to receive a message must output an index ii such that xiyix_i\neq y_i. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly Θ(min{n,log(1/δ)log2(nlog(1/δ))})\Theta(\min\{n,\log(1/\delta)\log^2(\frac n{\log(1/\delta)})\}) for failure probability δ\delta. Our lower bound holds even if promised support(y)support(x)\mathop{support}(y)\subset \mathop{support}(x). As a corollary, we obtain optimal lower bounds for p\ell_p-sampling in strict turnstile streams for 0p<20\le p < 2, as well as for the problem of finding duplicates in a stream. Our lower bounds do not need to use large weights, and hold even if promised x{0,1}nx\in\{0,1\}^n at all points in the stream. We give two different proofs of our main result. The first proof demonstrates that any algorithm A\mathcal A solving sampling problems in turnstile streams in low memory can be used to encode subsets of [n][n] of certain sizes into a number of bits below the information theoretic minimum. Our encoder makes adaptive queries to A\mathcal A throughout its execution, but done carefully so as to not violate correctness. This is accomplished by injecting random noise into the encoder's interactions with A\mathcal A, which is loosely motivated by techniques in differential privacy. Our second proof is via a novel randomized reduction from Augmented Indexing [MNSW98] which needs to interact with A\mathcal A adaptively. To handle the adaptivity we identify certain likely interaction patterns and union bound over them to guarantee correct interaction on all of them. To guarantee correctness, it is important that the interaction hides some of its randomness from A\mathcal A in the reduction.Comment: merge of arXiv:1703.08139 and of work of Kapralov, Woodruff, and Yahyazade

    Finding structure in data streams : correlations, independent sets, and matchings

    Get PDF
    The streaming model supposes that, rather than being available all at once, the data is received in a piecemeal fashion. In a world of massive data sets, streaming algorithms give a complementary approach to distributed algorithms: with the data all being available in one place but at different times, rather than at the same time in different places. We examine three different single-pass streaming problems where existing results show limited feasibility. We consider realistic relaxations or restrictions of these problems which allow for more efficient algorithms. In the correlation outliers problem, we wish to identify pairs of unusually correlated signals from a streamed matrix of observations. We show that a simple application of existing technique is space-optimal but has slow query time when the outlier threshold is small. We demonstrate how we can achieve faster query times at the cost of storing a larger data summary. In the maximum independent set problem, we wish to find an edge-less induced subgraph of maximum size. For arbitrary graphs, given as a stream of edges, it is known that no space-efficient algorithm exists. We consider a variant streaming model, where the graph is received vertex by vertex. While we show this model still does not admit efficient algorithms for general graphs, we demonstrate efficient approximation algorithms for various special graph classes. In the maximum matching problem, we wish to find a disjoint subset of edges of largest possible size. The greedy algorithm gives us an easy 2-approximation for streams of edges, but the problem becomes infeasible to solve if we allow unlimited edge deletions. We consider a model where, instead, a limited number of deletions are allowed. We describe several new approximation algorithms with complexity parameterised by the number of deletions. We also present new techniques which may lead to the development of corresponding tight lower bounds

    Dynamic Approximate Maximum Independent Set of Intervals, Hypercubes and Hyperrectangles

    Get PDF
    Independent set is a fundamental problem in combinatorial optimization. While in general graphs the problem is essentially inapproximable, for many important graph classes there are approximation algorithms known in the offline setting. These graph classes include interval graphs and geometric intersection graphs, where vertices correspond to intervals/geometric objects and an edge indicates that the two corresponding objects intersect. We present dynamic approximation algorithms for independent set of intervals, hypercubes and hyperrectangles in d dimensions. They work in the fully dynamic model where each update inserts or deletes a geometric object. All our algorithms are deterministic and have worst-case update times that are polylogarithmic for constant d and ?>0, assuming that the coordinates of all input objects are in [0, N]^d and each of their edges has length at least 1. We obtain the following results: - For weighted intervals, we maintain a (1+?)-approximate solution. - For d-dimensional hypercubes we maintain a (1+?)2^d-approximate solution in the unweighted case and a O(2^d)-approximate solution in the weighted case. Also, we show that for maintaining an unweighted (1+?)-approximate solution one needs polynomial update time for d ? 2 if the ETH holds. - For weighted d-dimensional hyperrectangles we present a dynamic algorithm with approximation ratio (1+?)log^{d-1}N

    Algorithmic Techniques for Processing Data Streams

    Get PDF
    We give a survey at some algorithmic techniques for processing data streams. After covering the basic methods of sampling and sketching, we present more evolved procedures that resort on those basic ones. In particular, we examine algorithmic schemes for similarity mining, the concept of group testing, and techniques for clustering and summarizing data streams

    Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs

    Full text link
    As massive graphs become more prevalent, there is a rapidly growing need for scalable algorithms that solve classical graph problems, such as maximum matching and minimum vertex cover, on large datasets. For massive inputs, several different computational models have been introduced, including the streaming model, the distributed communication model, and the massively parallel computation (MPC) model that is a common abstraction of MapReduce-style computation. In each model, algorithms are analyzed in terms of resources such as space used or rounds of communication needed, in addition to the more traditional approximation ratio. In this paper, we give a single unified approach that yields better approximation algorithms for matching and vertex cover in all these models. The highlights include: * The first one pass, significantly-better-than-2-approximation for matching in random arrival streams that uses subquadratic space, namely a (1.5+ϵ)(1.5+\epsilon)-approximation streaming algorithm that uses O(n1.5)O(n^{1.5}) space for constant ϵ>0\epsilon > 0. * The first 2-round, better-than-2-approximation for matching in the MPC model that uses subquadratic space per machine, namely a (1.5+ϵ)(1.5+\epsilon)-approximation algorithm with O(mn+n)O(\sqrt{mn} + n) memory per machine for constant ϵ>0\epsilon > 0. By building on our unified approach, we further develop parallel algorithms in the MPC model that give a (1+ϵ)(1 + \epsilon)-approximation to matching and an O(1)O(1)-approximation to vertex cover in only O(loglogn)O(\log\log{n}) MPC rounds and O(n/polylog(n))O(n/poly\log{(n)}) memory per machine. These results settle multiple open questions posed in the recent paper of Czumaj~et.al. [STOC 2018]

    On Constructing Spanners from Random Gaussian Projections

    Get PDF
    Graph sketching is a powerful paradigm for analyzing graph structure via linear measurements introduced by Ahn, Guha, and McGregor (SODA\u2712) that has since found numerous applications in streaming, distributed computing, and massively parallel algorithms, among others. Graph sketching has proven to be quite successful for various problems such as connectivity, minimum spanning trees, edge or vertex connectivity, and cut or spectral sparsifiers. Yet, the problem of approximating shortest path metric of a graph, and specifically computing a spanner, is notably missing from the list of successes. This has turned the status of this fundamental problem into one of the most longstanding open questions in this area. We present a partial explanation of this lack of success by proving a strong lower bound for a large family of graph sketching algorithms that encompasses prior work on spanners and many (but importantly not also all) related cut-based problems mentioned above. Our lower bound matches the algorithmic bounds of the recent result of Filtser, Kapralov, and Nouri (SODA\u2721), up to lower order terms, for constructing spanners via the same graph sketching family. This establishes near-optimality of these bounds, at least restricted to this family of graph sketching techniques, and makes progress on a conjecture posed in this latter work
    corecore