252 research outputs found
Randomized Composable Core-sets for Distributed Submodular Maximization
An effective technique for solving optimization problems over massive data
sets is to partition the data into smaller pieces, solve the problem on each
piece and compute a representative solution from it, and finally obtain a
solution inside the union of the representative solutions for all pieces. This
technique can be captured via the concept of {\em composable core-sets}, and
has been recently applied to solve diversity maximization problems as well as
several clustering problems. However, for coverage and submodular maximization
problems, impossibility bounds are known for this technique \cite{IMMM14}. In
this paper, we focus on efficient construction of a randomized variant of
composable core-sets where the above idea is applied on a {\em random
clustering} of the data. We employ this technique for the coverage, monotone
and non-monotone submodular maximization problems. Our results significantly
improve upon the hardness results for non-randomized core-sets, and imply
improved results for submodular maximization in a distributed and streaming
settings.
In summary, we show that a simple greedy algorithm results in a
-approximate randomized composable core-set for submodular maximization
under a cardinality constraint. This is in contrast to a known impossibility result for (non-randomized) composable core-set. Our
result also extends to non-monotone submodular functions, and leads to the
first 2-round MapReduce-based constant-factor approximation algorithm with
total communication complexity for either monotone or non-monotone
functions. Finally, using an improved analysis technique and a new algorithm
, we present an improved -approximation algorithm
for monotone submodular maximization, which is in turn the first
MapReduce-based algorithm beating factor in a constant number of rounds
MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension
Given a dataset of points in a metric space and an integer , a diversity
maximization problem requires determining a subset of points maximizing
some diversity objective measure, e.g., the minimum or the average distance
between two points in the subset. Diversity maximization is computationally
hard, hence only approximate solutions can be hoped for. Although its
applications are mainly in massive data analysis, most of the past research on
diversity maximization focused on the sequential setting. In this work we
present space and pass/round-efficient diversity maximization algorithms for
the Streaming and MapReduce models and analyze their approximation guarantees
for the relevant class of metric spaces of bounded doubling dimension. Like
other approaches in the literature, our algorithms rely on the determination of
high-quality core-sets, i.e., (much) smaller subsets of the input which contain
good approximations to the optimal solution for the whole input. For a variety
of diversity objective functions, our algorithms attain an
-approximation ratio, for any constant , where
is the best approximation ratio achieved by a polynomial-time,
linear-space sequential algorithm for the same diversity objective. This
improves substantially over the approximation ratios attainable in Streaming
and MapReduce by state-of-the-art algorithms for general metric spaces. We
provide extensive experimental evidence of the effectiveness of our algorithms
on both real world and synthetic datasets, scaling up to over a billion points.Comment: Extended version of
http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5,
January 201
Almost Optimal Streaming Algorithms for Coverage Problems
Maximum coverage and minimum set cover problems --collectively called
coverage problems-- have been studied extensively in streaming models. However,
previous research not only achieve sub-optimal approximation factors and space
complexities, but also study a restricted set arrival model which makes an
explicit or implicit assumption on oracle access to the sets, ignoring the
complexity of reading and storing the whole set at once. In this paper, we
address the above shortcomings, and present algorithms with improved
approximation factor and improved space complexity, and prove that our results
are almost tight. Moreover, unlike most of previous work, our results hold on a
more general edge arrival model. More specifically, we present (almost) optimal
approximation algorithms for maximum coverage and minimum set cover problems in
the streaming model with an (almost) optimal space complexity of
, i.e., the space is {\em independent of the size of the sets or
the size of the ground set of elements}. These results not only improve over
the best known algorithms for the set arrival model, but also are the first
such algorithms for the more powerful {\em edge arrival} model. In order to
achieve the above results, we introduce a new general sketching technique for
coverage functions: This sketching scheme can be applied to convert an
-approximation algorithm for a coverage problem to a
(1-\eps)\alpha-approximation algorithm for the same problem in streaming, or
RAM models. We show the significance of our sketching technique by ruling out
the possibility of solving coverage problems via accessing (as a black box) a
(1 \pm \eps)-approximate oracle (e.g., a sketch function) that estimates the
coverage function on any subfamily of the sets
Improved Diversity Maximization Algorithms for Matching and Pseudoforest
In this work we consider the diversity maximization problem, where given a
data set of elements, and a parameter , the goal is to pick a subset
of of size maximizing a certain diversity measure. [CH01] defined a
variety of diversity measures based on pairwise distances between the points. A
constant factor approximation algorithm was known for all those diversity
measures except ``remote-matching'', where only an approximation
was known. In this work we present an approximation for this remaining
notion. Further, we consider these notions from the perpective of composable
coresets. [IMMM14] provided composable coresets with a constant factor
approximation for all but ``remote-pseudoforest'' and ``remote-matching'',
which again they only obtained a approximation. Here we also close
the gap up to constants and present a constant factor composable coreset
algorithm for these two notions. For remote-matching, our coreset has size only
, and for remote-pseudoforest, our coreset has size
for any , for an
-approximate coreset.Comment: 27 pages, 1 table. Accepted to APPROX, 202
- …