149 research outputs found
A New Framework for Distributed Submodular Maximization
A wide variety of problems in machine learning, including exemplar
clustering, document summarization, and sensor placement, can be cast as
constrained submodular maximization problems. A lot of recent effort has been
devoted to developing distributed algorithms for these problems. However, these
results suffer from high number of rounds, suboptimal approximation ratios, or
both. We develop a framework for bringing existing algorithms in the sequential
setting to the distributed setting, achieving near optimal approximation ratios
for many settings in only a constant number of MapReduce rounds. Our techniques
also give a fast sequential algorithm for non-monotone maximization subject to
a matroid constraint
Randomized Composable Core-sets for Distributed Submodular Maximization
An effective technique for solving optimization problems over massive data
sets is to partition the data into smaller pieces, solve the problem on each
piece and compute a representative solution from it, and finally obtain a
solution inside the union of the representative solutions for all pieces. This
technique can be captured via the concept of {\em composable core-sets}, and
has been recently applied to solve diversity maximization problems as well as
several clustering problems. However, for coverage and submodular maximization
problems, impossibility bounds are known for this technique \cite{IMMM14}. In
this paper, we focus on efficient construction of a randomized variant of
composable core-sets where the above idea is applied on a {\em random
clustering} of the data. We employ this technique for the coverage, monotone
and non-monotone submodular maximization problems. Our results significantly
improve upon the hardness results for non-randomized core-sets, and imply
improved results for submodular maximization in a distributed and streaming
settings.
In summary, we show that a simple greedy algorithm results in a
-approximate randomized composable core-set for submodular maximization
under a cardinality constraint. This is in contrast to a known impossibility result for (non-randomized) composable core-set. Our
result also extends to non-monotone submodular functions, and leads to the
first 2-round MapReduce-based constant-factor approximation algorithm with
total communication complexity for either monotone or non-monotone
functions. Finally, using an improved analysis technique and a new algorithm
, we present an improved -approximation algorithm
for monotone submodular maximization, which is in turn the first
MapReduce-based algorithm beating factor in a constant number of rounds
Distributed Submodular Maximization
Many large-scale machine learning problems--clustering, non-parametric
learning, kernel machines, etc.--require selecting a small yet representative
subset from a large dataset. Such problems can often be reduced to maximizing a
submodular set function subject to various constraints. Classical approaches to
submodular optimization require centralized access to the full dataset, which
is impractical for truly large-scale problems. In this paper, we consider the
problem of submodular function maximization in a distributed fashion. We
develop a simple, two-stage protocol GreeDi, that is easily implemented using
MapReduce style computations. We theoretically analyze our approach, and show
that under certain natural conditions, performance close to the centralized
approach can be achieved. We begin with monotone submodular maximization
subject to a cardinality constraint, and then extend this approach to obtain
approximation guarantees for (not necessarily monotone) submodular maximization
subject to more general constraints including matroid or knapsack constraints.
In our extensive experiments, we demonstrate the effectiveness of our approach
on several applications, including sparse Gaussian process inference and
exemplar based clustering on tens of millions of examples using Hadoop
Distributed Submodular Maximization with Parallel Execution
The submodular maximization problem is widely applicable in many engineering problems where objectives exhibit diminishing returns. While this problem is known to be NP-hard for certain subclasses of objective functions, there is a greedy algorithm which guarantees approximation at least 1/2 of the optimal solution. This greedy algorithm can be implemented with a set of agents, each making a decision sequentially based on the choices of all prior agents. In this paper, we consider a generalization of the greedy algorithm in which agents can make decisions in parallel, rather than strictly in sequence. In particular, we are interested in partitioning the agents, where a set of agents in the partition all make a decision simultaneously based on the choices of prior agents, so that the algorithm terminates in limited iterations. We provide bounds on the performance of this parallelized version of the greedy algorithm and show that dividing the agents evenly among the sets in the partition yields an optimal structure. It is shown that such optimal structures holds even under very relaxed information constraints. We additionally show that this optimal structure is still near-optimal, even when additional information (i.e., total curvature) is known about the objective function
Distributed Submodular Maximization with Parallel Execution
The submodular maximization problem is widely applicable in many engineering
problems where objectives exhibit diminishing returns. While this problem is
known to be NP-hard for certain subclasses of objective functions, there is a
greedy algorithm which guarantees approximation at least 1/2 of the optimal
solution. This greedy algorithm can be implemented with a set of agents, each
making a decision sequentially based on the choices of all prior agents. In
this paper, we consider a generalization of the greedy algorithm in which
agents can make decisions in parallel, rather than strictly in sequence. In
particular, we are interested in partitioning the agents, where a set of agents
in the partition all make a decision simultaneously based on the choices of
prior agents, so that the algorithm terminates in limited iterations. We
provide bounds on the performance of this parallelized version of the greedy
algorithm and show that dividing the agents evenly among the sets in the
partition yields an optimal structure. We additionally show that this optimal
structure is still near-optimal when the objective function exhibits a certain
monotone property. Lastly, we show that the same performance guarantees can be
achieved in the parallelized greedy algorithm even when agents can only observe
the decisions of a subset of prior agents
Distributed Submodular Maximization with Parallel Execution
The submodular maximization problem is widely applicable in many engineering problems where objectives exhibit diminishing returns. While this problem is known to be NP-hard for certain subclasses of objective functions, there is a greedy algorithm which guarantees approximation at least 1/2 of the optimal solution. This greedy algorithm can be implemented with a set of agents, each making a decision sequentially based on the choices of all prior agents. In this paper, we consider a generalization of the greedy algorithm in which agents can make decisions in parallel, rather than strictly in sequence. In particular, we are interested in partitioning the agents, where a set of agents in the partition all make a decision simultaneously based on the choices of prior agents, so that the algorithm terminates in limited iterations. We provide bounds on the performance of this parallelized version of the greedy algorithm and show that dividing the agents evenly among the sets in the partition yields an optimal structure. It is shown that such optimal structures holds even under very relaxed information constraints. We additionally show that this optimal structure is still near-optimal, even when additional information (i.e., total curvature) is known about the objective function
The Power of Randomization: Distributed Submodular Maximization on Massive Datasets
A wide variety of problems in machine learning, including exemplar
clustering, document summarization, and sensor placement, can be cast as
constrained submodular maximization problems. Unfortunately, the resulting
submodular optimization problems are often too large to be solved on a single
machine. We develop a simple distributed algorithm that is embarrassingly
parallel and it achieves provable, constant factor, worst-case approximation
guarantees. In our experiments, we demonstrate its efficiency in large problems
with different kinds of constraints with objective values always close to what
is achievable in the centralized setting
Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity
Submodular maximization is a general optimization problem with a wide range
of applications in machine learning (e.g., active learning, clustering, and
feature selection). In large-scale optimization, the parallel running time of
an algorithm is governed by its adaptivity, which measures the number of
sequential rounds needed if the algorithm can execute polynomially-many
independent oracle queries in parallel. While low adaptivity is ideal, it is
not sufficient for an algorithm to be efficient in practice---there are many
applications of distributed submodular optimization where the number of
function evaluations becomes prohibitively expensive. Motivated by these
applications, we study the adaptivity and query complexity of submodular
maximization. In this paper, we give the first constant-factor approximation
algorithm for maximizing a non-monotone submodular function subject to a
cardinality constraint that runs in adaptive rounds and makes
oracle queries in expectation. In our empirical study, we use
three real-world applications to compare our algorithm with several benchmarks
for non-monotone submodular maximization. The results demonstrate that our
algorithm finds competitive solutions using significantly fewer rounds and
queries.Comment: 12 pages, 8 figure
- …