3 research outputs found

    Almost Optimal Streaming Algorithms for Coverage Problems

    Full text link
    Maximum coverage and minimum set cover problems --collectively called coverage problems-- have been studied extensively in streaming models. However, previous research not only achieve sub-optimal approximation factors and space complexities, but also study a restricted set arrival model which makes an explicit or implicit assumption on oracle access to the sets, ignoring the complexity of reading and storing the whole set at once. In this paper, we address the above shortcomings, and present algorithms with improved approximation factor and improved space complexity, and prove that our results are almost tight. Moreover, unlike most of previous work, our results hold on a more general edge arrival model. More specifically, we present (almost) optimal approximation algorithms for maximum coverage and minimum set cover problems in the streaming model with an (almost) optimal space complexity of O~(n)\tilde{O}(n), i.e., the space is {\em independent of the size of the sets or the size of the ground set of elements}. These results not only improve over the best known algorithms for the set arrival model, but also are the first such algorithms for the more powerful {\em edge arrival} model. In order to achieve the above results, we introduce a new general sketching technique for coverage functions: This sketching scheme can be applied to convert an α\alpha-approximation algorithm for a coverage problem to a (1-\eps)\alpha-approximation algorithm for the same problem in streaming, or RAM models. We show the significance of our sketching technique by ruling out the possibility of solving coverage problems via accessing (as a black box) a (1 \pm \eps)-approximate oracle (e.g., a sketch function) that estimates the coverage function on any subfamily of the sets

    Set Cover in Sub-linear Time

    Full text link
    We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of mm sets over nn elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in Har-Peled et al. [2016] to the sub-linear query model, that returns an α\alpha-approximate cover using O~(m(n/k)1/(α−1)+nk)\tilde{O}(m(n/k)^{1/(\alpha-1)} + nk) queries to the input, where kk denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of kk, the required number of queries is Ω~(m(n/k)1/(2α))\tilde{\Omega}(m(n/k)^{1/(2\alpha)}), even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require Ω(nk)\Omega(nk) queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter kk. On the other hand, we show that this bound is not optimal for larger values of the parameter kk, as there exists a (1+ε)(1+\varepsilon)-approximation algorithm with O~(mn/kε2)\tilde{O}(mn/k\varepsilon^2) queries. We show that this bound is essentially tight for sufficiently small constant ε\varepsilon, by establishing a lower bound of Ω~(mn/k)\tilde{\Omega}(mn/k) query complexity

    Tight Bounds on the Round Complexity of the Distributed Maximum Coverage Problem

    Full text link
    We study the maximum kk-set coverage problem in the following distributed setting. A collection of sets S1,…,SmS_1,\ldots,S_m over a universe [n][n] is partitioned across pp machines and the goal is to find kk sets whose union covers the most number of elements. The computation proceeds in synchronous rounds. In each round, all machines simultaneously send a message to a central coordinator who then communicates back to all machines a summary to guide the computation for the next round. At the end, the coordinator outputs the answer. The main measures of efficiency in this setting are the approximation ratio of the returned solution, the communication cost of each machine, and the number of rounds of computation. Our main result is an asymptotically tight bound on the tradeoff between these measures for the distributed maximum coverage problem. We first show that any rr-round protocol for this problem either incurs a communication cost of k⋅mΩ(1/r) k \cdot m^{\Omega(1/r)} or only achieves an approximation factor of kΩ(1/r)k^{\Omega(1/r)}. This implies that any protocol that simultaneously achieves good approximation ratio (O(1)O(1) approximation) and good communication cost (O~(n)\widetilde{O}(n) communication per machine), essentially requires logarithmic (in kk) number of rounds. We complement our lower bound result by showing that there exist an rr-round protocol that achieves an ee−1\frac{e}{e-1}-approximation (essentially best possible) with a communication cost of k⋅mO(1/r)k \cdot m^{O(1/r)} as well as an rr-round protocol that achieves a kO(1/r)k^{O(1/r)}-approximation with only O~(n)\widetilde{O}(n) communication per each machine (essentially best possible). We further use our results in this distributed setting to obtain new bounds for the maximum coverage problem in two other main models of computation for massive datasets, namely, the dynamic streaming model and the MapReduce model
    corecore