3 research outputs found
Almost Optimal Streaming Algorithms for Coverage Problems
Maximum coverage and minimum set cover problems --collectively called
coverage problems-- have been studied extensively in streaming models. However,
previous research not only achieve sub-optimal approximation factors and space
complexities, but also study a restricted set arrival model which makes an
explicit or implicit assumption on oracle access to the sets, ignoring the
complexity of reading and storing the whole set at once. In this paper, we
address the above shortcomings, and present algorithms with improved
approximation factor and improved space complexity, and prove that our results
are almost tight. Moreover, unlike most of previous work, our results hold on a
more general edge arrival model. More specifically, we present (almost) optimal
approximation algorithms for maximum coverage and minimum set cover problems in
the streaming model with an (almost) optimal space complexity of
, i.e., the space is {\em independent of the size of the sets or
the size of the ground set of elements}. These results not only improve over
the best known algorithms for the set arrival model, but also are the first
such algorithms for the more powerful {\em edge arrival} model. In order to
achieve the above results, we introduce a new general sketching technique for
coverage functions: This sketching scheme can be applied to convert an
-approximation algorithm for a coverage problem to a
(1-\eps)\alpha-approximation algorithm for the same problem in streaming, or
RAM models. We show the significance of our sketching technique by ruling out
the possibility of solving coverage problems via accessing (as a black box) a
(1 \pm \eps)-approximate oracle (e.g., a sketch function) that estimates the
coverage function on any subfamily of the sets
Set Cover in Sub-linear Time
We study the classic set cover problem from the perspective of sub-linear
algorithms. Given access to a collection of sets over elements in the
query model, we show that sub-linear algorithms derived from existing
techniques have almost tight query complexities.
On one hand, first we show an adaptation of the streaming algorithm presented
in Har-Peled et al. [2016] to the sub-linear query model, that returns an
-approximate cover using
queries to the input, where denotes the value of a minimum set cover. We
then complement this upper bound by proving that for lower values of , the
required number of queries is , even for
estimating the optimal cover size. Moreover, we prove that even checking
whether a given collection of sets covers all the elements would require
queries. These two lower bounds provide strong evidence that the
upper bound is almost tight for certain values of the parameter .
On the other hand, we show that this bound is not optimal for larger values
of the parameter , as there exists a -approximation
algorithm with queries. We show that this bound
is essentially tight for sufficiently small constant , by
establishing a lower bound of query complexity
Tight Bounds on the Round Complexity of the Distributed Maximum Coverage Problem
We study the maximum -set coverage problem in the following distributed
setting. A collection of sets over a universe is
partitioned across machines and the goal is to find sets whose union
covers the most number of elements. The computation proceeds in synchronous
rounds. In each round, all machines simultaneously send a message to a central
coordinator who then communicates back to all machines a summary to guide the
computation for the next round. At the end, the coordinator outputs the answer.
The main measures of efficiency in this setting are the approximation ratio of
the returned solution, the communication cost of each machine, and the number
of rounds of computation.
Our main result is an asymptotically tight bound on the tradeoff between
these measures for the distributed maximum coverage problem. We first show that
any -round protocol for this problem either incurs a communication cost of or only achieves an approximation factor of
. This implies that any protocol that simultaneously achieves
good approximation ratio ( approximation) and good communication cost
( communication per machine), essentially requires
logarithmic (in ) number of rounds. We complement our lower bound result by
showing that there exist an -round protocol that achieves an
-approximation (essentially best possible) with a communication
cost of as well as an -round protocol that achieves a
-approximation with only communication per each
machine (essentially best possible).
We further use our results in this distributed setting to obtain new bounds
for the maximum coverage problem in two other main models of computation for
massive datasets, namely, the dynamic streaming model and the MapReduce model