199 research outputs found
Submodular Optimization in the MapReduce Model
Submodular optimization has received significant attention in both practice and theory, as a wide array of problems in machine learning, auction theory, and combinatorial optimization have submodular structure. In practice, these problems often involve large amounts of data, and must be solved in a distributed way. One popular framework for running such distributed algorithms is MapReduce. In this paper, we present two simple algorithms for cardinality constrained submodular optimization in the MapReduce model: the first is a (1/2-o(1))-approximation in 2 MapReduce rounds, and the second is a (1-1/e-epsilon)-approximation in (1+o(1))/epsilon MapReduce rounds
Randomized Composable Core-sets for Distributed Submodular Maximization
An effective technique for solving optimization problems over massive data
sets is to partition the data into smaller pieces, solve the problem on each
piece and compute a representative solution from it, and finally obtain a
solution inside the union of the representative solutions for all pieces. This
technique can be captured via the concept of {\em composable core-sets}, and
has been recently applied to solve diversity maximization problems as well as
several clustering problems. However, for coverage and submodular maximization
problems, impossibility bounds are known for this technique \cite{IMMM14}. In
this paper, we focus on efficient construction of a randomized variant of
composable core-sets where the above idea is applied on a {\em random
clustering} of the data. We employ this technique for the coverage, monotone
and non-monotone submodular maximization problems. Our results significantly
improve upon the hardness results for non-randomized core-sets, and imply
improved results for submodular maximization in a distributed and streaming
settings.
In summary, we show that a simple greedy algorithm results in a
-approximate randomized composable core-set for submodular maximization
under a cardinality constraint. This is in contrast to a known impossibility result for (non-randomized) composable core-set. Our
result also extends to non-monotone submodular functions, and leads to the
first 2-round MapReduce-based constant-factor approximation algorithm with
total communication complexity for either monotone or non-monotone
functions. Finally, using an improved analysis technique and a new algorithm
, we present an improved -approximation algorithm
for monotone submodular maximization, which is in turn the first
MapReduce-based algorithm beating factor in a constant number of rounds
Scalable Methods for Adaptively Seeding a Social Network
In recent years, social networking platforms have developed into
extraordinary channels for spreading and consuming information. Along with the
rise of such infrastructure, there is continuous progress on techniques for
spreading information effectively through influential users. In many
applications, one is restricted to select influencers from a set of users who
engaged with the topic being promoted, and due to the structure of social
networks, these users often rank low in terms of their influence potential. An
alternative approach one can consider is an adaptive method which selects users
in a manner which targets their influential neighbors. The advantage of such an
approach is that it leverages the friendship paradox in social networks: while
users are often not influential, they often know someone who is.
Despite the various complexities in such optimization problems, we show that
scalable adaptive seeding is achievable. In particular, we develop algorithms
for linear influence models with provable approximation guarantees that can be
gracefully parallelized. To show the effectiveness of our methods we collected
data from various verticals social network users follow. For each vertical, we
collected data on the users who responded to a certain post as well as their
neighbors, and applied our methods on this data. Our experiments show that
adaptive seeding is scalable, and importantly, that it obtains dramatic
improvements over standard approaches of information dissemination.Comment: Full version of the paper appearing in WWW 201
A New Framework for Distributed Submodular Maximization
A wide variety of problems in machine learning, including exemplar
clustering, document summarization, and sensor placement, can be cast as
constrained submodular maximization problems. A lot of recent effort has been
devoted to developing distributed algorithms for these problems. However, these
results suffer from high number of rounds, suboptimal approximation ratios, or
both. We develop a framework for bringing existing algorithms in the sequential
setting to the distributed setting, achieving near optimal approximation ratios
for many settings in only a constant number of MapReduce rounds. Our techniques
also give a fast sequential algorithm for non-monotone maximization subject to
a matroid constraint
The Power of Randomization: Distributed Submodular Maximization on Massive Datasets
A wide variety of problems in machine learning, including exemplar
clustering, document summarization, and sensor placement, can be cast as
constrained submodular maximization problems. Unfortunately, the resulting
submodular optimization problems are often too large to be solved on a single
machine. We develop a simple distributed algorithm that is embarrassingly
parallel and it achieves provable, constant factor, worst-case approximation
guarantees. In our experiments, we demonstrate its efficiency in large problems
with different kinds of constraints with objective values always close to what
is achievable in the centralized setting
Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity
Submodular maximization is a general optimization problem with a wide range
of applications in machine learning (e.g., active learning, clustering, and
feature selection). In large-scale optimization, the parallel running time of
an algorithm is governed by its adaptivity, which measures the number of
sequential rounds needed if the algorithm can execute polynomially-many
independent oracle queries in parallel. While low adaptivity is ideal, it is
not sufficient for an algorithm to be efficient in practice---there are many
applications of distributed submodular optimization where the number of
function evaluations becomes prohibitively expensive. Motivated by these
applications, we study the adaptivity and query complexity of submodular
maximization. In this paper, we give the first constant-factor approximation
algorithm for maximizing a non-monotone submodular function subject to a
cardinality constraint that runs in adaptive rounds and makes
oracle queries in expectation. In our empirical study, we use
three real-world applications to compare our algorithm with several benchmarks
for non-monotone submodular maximization. The results demonstrate that our
algorithm finds competitive solutions using significantly fewer rounds and
queries.Comment: 12 pages, 8 figure
Adversarially Robust Submodular Maximization under Knapsack Constraints
We propose the first adversarially robust algorithm for monotone submodular
maximization under single and multiple knapsack constraints with scalable
implementations in distributed and streaming settings. For a single knapsack
constraint, our algorithm outputs a robust summary of almost optimal (up to
polylogarithmic factors) size, from which a constant-factor approximation to
the optimal solution can be constructed. For multiple knapsack constraints, our
approximation is within a constant-factor of the best known non-robust
solution.
We evaluate the performance of our algorithms by comparison to natural
robustifications of existing non-robust algorithms under two objectives: 1)
dominating set for large social network graphs from Facebook and Twitter
collected by the Stanford Network Analysis Project (SNAP), 2) movie
recommendations on a dataset from MovieLens. Experimental results show that our
algorithms give the best objective for a majority of the inputs and show strong
performance even compared to offline algorithms that are given the set of
removals in advance.Comment: To appear in KDD 201
Fast Distributed Approximation for Max-Cut
Finding a maximum cut is a fundamental task in many computational settings.
Surprisingly, it has been insufficiently studied in the classic distributed
settings, where vertices communicate by synchronously sending messages to their
neighbors according to the underlying graph, known as the or
models. We amend this by obtaining almost optimal
algorithms for Max-Cut on a wide class of graphs in these models. In
particular, for any , we develop randomized approximation
algorithms achieving a ratio of to the optimum for Max-Cut on
bipartite graphs in the model, and on general graphs in the
model.
We further present efficient deterministic algorithms, including a
-approximation for Max-Dicut in our models, thus improving the best known
(randomized) ratio of . Our algorithms make non-trivial use of the greedy
approach of Buchbinder et al. (SIAM Journal on Computing, 2015) for maximizing
an unconstrained (non-monotone) submodular function, which may be of
independent interest
- …