Search CORE

3 research outputs found

Almost Optimal Streaming Algorithms for Coverage Problems

Author: Cormode G.
Kelner J. A.
Mirzasoleiman B.
Muthukrishnan S.
Publication venue
Publication date: 09/03/2017
Field of study

Maximum coverage and minimum set cover problems --collectively called coverage problems-- have been studied extensively in streaming models. However, previous research not only achieve sub-optimal approximation factors and space complexities, but also study a restricted set arrival model which makes an explicit or implicit assumption on oracle access to the sets, ignoring the complexity of reading and storing the whole set at once. In this paper, we address the above shortcomings, and present algorithms with improved approximation factor and improved space complexity, and prove that our results are almost tight. Moreover, unlike most of previous work, our results hold on a more general edge arrival model. More specifically, we present (almost) optimal approximation algorithms for maximum coverage and minimum set cover problems in the streaming model with an (almost) optimal space complexity of

\tilde{O}(n)

, i.e., the space is {\em independent of the size of the sets or the size of the ground set of elements}. These results not only improve over the best known algorithms for the set arrival model, but also are the first such algorithms for the more powerful {\em edge arrival} model. In order to achieve the above results, we introduce a new general sketching technique for coverage functions: This sketching scheme can be applied to convert an

\alpha

-approximation algorithm for a coverage problem to a (1-\eps)\alpha-approximation algorithm for the same problem in streaming, or RAM models. We show the significance of our sketching technique by ruling out the possibility of solving coverage problems via accessing (as a black box) a (1 \pm \eps)-approximate oracle (e.g., a sketch function) that estimates the coverage function on any subfamily of the sets

arXiv.org e-Print Archive

Crossref

Set Cover in Sub-linear Time

Author: Indyk Piotr
Mahabadi Sepideh
Rubinfeld Ronitt
Vakilian Ali
Yodpinyanee Anak
Publication venue
Publication date: 01/01/2018
Field of study

We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of

m

sets over

n

elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in Har-Peled et al. [2016] to the sub-linear query model, that returns an

\alpha

-approximate cover using

\tilde{O}(m(n/k)^{1/(\alpha-1)} + nk)

queries to the input, where

k

denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of

k

, the required number of queries is

\tilde{\Omega}(m(n/k)^{1/(2\alpha)})

, even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require

\Omega(nk)

queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter

k

. On the other hand, we show that this bound is not optimal for larger values of the parameter

k

, as there exists a

(1+\varepsilon)

-approximation algorithm with

\tilde{O}(mn/k\varepsilon^2)

queries. We show that this bound is essentially tight for sufficiently small constant

\varepsilon

, by establishing a lower bound of

\tilde{\Omega}(mn/k)

query complexity

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Tight Bounds on the Round Complexity of the Distributed Maximum Coverage Problem

Author: Assadi Sepehr
Khanna Sanjeev
Publication venue
Publication date: 22/08/2018
Field of study

We study the maximum

k

-set coverage problem in the following distributed setting. A collection of sets

S_1,\ldots,S_m

over a universe

[n]

is partitioned across

p

machines and the goal is to find

k

sets whose union covers the most number of elements. The computation proceeds in synchronous rounds. In each round, all machines simultaneously send a message to a central coordinator who then communicates back to all machines a summary to guide the computation for the next round. At the end, the coordinator outputs the answer. The main measures of efficiency in this setting are the approximation ratio of the returned solution, the communication cost of each machine, and the number of rounds of computation. Our main result is an asymptotically tight bound on the tradeoff between these measures for the distributed maximum coverage problem. We first show that any

r

-round protocol for this problem either incurs a communication cost of

k \cdot m^{\Omega(1/r)}

or only achieves an approximation factor of

k^{\Omega(1/r)}

. This implies that any protocol that simultaneously achieves good approximation ratio (

O(1)

approximation) and good communication cost (

\widetilde{O}(n)

communication per machine), essentially requires logarithmic (in

k

) number of rounds. We complement our lower bound result by showing that there exist an

r

-round protocol that achieves an

\frac{e}{e-1}

-approximation (essentially best possible) with a communication cost of

k \cdot m^{O(1/r)}

as well as an

r

-round protocol that achieves a

k^{O(1/r)}

-approximation with only

\widetilde{O}(n)

communication per each machine (essentially best possible). We further use our results in this distributed setting to obtain new bounds for the maximum coverage problem in two other main models of computation for massive datasets, namely, the dynamic streaming model and the MapReduce model

arXiv.org e-Print Archive

Crossref