285 research outputs found
Locally Adaptive Optimization: Adaptive Seeding for Monotone Submodular Functions
The Adaptive Seeding problem is an algorithmic challenge motivated by
influence maximization in social networks: One seeks to select among certain
accessible nodes in a network, and then select, adaptively, among neighbors of
those nodes as they become accessible in order to maximize a global objective
function. More generally, adaptive seeding is a stochastic optimization
framework where the choices in the first stage affect the realizations in the
second stage, over which we aim to optimize.
Our main result is a -approximation for the adaptive seeding
problem for any monotone submodular function. While adaptive policies are often
approximated via non-adaptive policies, our algorithm is based on a novel
method we call \emph{locally-adaptive} policies. These policies combine a
non-adaptive global structure, with local adaptive optimizations. This method
enables the -approximation for general monotone submodular functions
and circumvents some of the impossibilities associated with non-adaptive
policies.
We also introduce a fundamental problem in submodular optimization that may
be of independent interest: given a ground set of elements where every element
appears with some small probability, find a set of expected size at most
that has the highest expected value over the realization of the elements. We
show a surprising result: there are classes of monotone submodular functions
(including coverage) that can be approximated almost optimally as the
probability vanishes. For general monotone submodular functions we show via a
reduction from \textsc{Planted-Clique} that approximations for this problem are
not likely to be obtainable. This optimization problem is an important tool for
adaptive seeding via non-adaptive policies, and its hardness motivates the
introduction of \emph{locally-adaptive} policies we use in the main result
Combining Traditional Marketing and Viral Marketing with Amphibious Influence Maximization
In this paper, we propose the amphibious influence maximization (AIM) model
that combines traditional marketing via content providers and viral marketing
to consumers in social networks in a single framework. In AIM, a set of content
providers and consumers form a bipartite network while consumers also form
their social network, and influence propagates from the content providers to
consumers and among consumers in the social network following the independent
cascade model. An advertiser needs to select a subset of seed content providers
and a subset of seed consumers, such that the influence from the seed providers
passing through the seed consumers could reach a large number of consumers in
the social network in expectation.
We prove that the AIM problem is NP-hard to approximate to within any
constant factor via a reduction from Feige's k-prover proof system for 3-SAT5.
We also give evidence that even when the social network graph is trivial (i.e.
has no edges), a polynomial time constant factor approximation for AIM is
unlikely. However, when we assume that the weighted bi-adjacency matrix that
describes the influence of content providers on consumers is of constant rank,
a common assumption often used in recommender systems, we provide a
polynomial-time algorithm that achieves approximation ratio of
for any (polynomially small) . Our
algorithmic results still hold for a more general model where cascades in
social network follow a general monotone and submodular function.Comment: An extended abstract appeared in the Proceedings of the 16th ACM
Conference on Economics and Computation (EC), 201
Resolution of ranking hierarchies in directed networks
Identifying hierarchies and rankings of nodes in directed graphs is
fundamental in many applications such as social network analysis, biology,
economics, and finance. A recently proposed method identifies the hierarchy by
finding the ordered partition of nodes which minimises a score function, termed
agony. This function penalises the links violating the hierarchy in a way
depending on the strength of the violation. To investigate the resolution of
ranking hierarchies we introduce an ensemble of random graphs, the Ranked
Stochastic Block Model. We find that agony may fail to identify hierarchies
when the structure is not strong enough and the size of the classes is small
with respect to the whole network. We analytically characterise the resolution
threshold and we show that an iterated version of agony can partly overcome
this resolution limit.Comment: 27 pages, 9 figure
Relax, no need to round: integrality of clustering formulations
We study exact recovery conditions for convex relaxations of point cloud
clustering problems, focusing on two of the most common optimization problems
for unsupervised clustering: -means and -median clustering. Motivations
for focusing on convex relaxations are: (a) they come with a certificate of
optimality, and (b) they are generic tools which are relatively parameter-free,
not tailored to specific assumptions over the input. More precisely, we
consider the distributional setting where there are clusters in
and data from each cluster consists of points sampled from a
symmetric distribution within a ball of unit radius. We ask: what is the
minimal separation distance between cluster centers needed for convex
relaxations to exactly recover these clusters as the optimal integral
solution? For the -median linear programming relaxation we show a tight
bound: exact recovery is obtained given arbitrarily small pairwise separation
between the balls. In other words, the pairwise center
separation is . Under the same distributional model, the
-means LP relaxation fails to recover such clusters at separation as large
as . Yet, if we enforce PSD constraints on the -means LP, we get
exact cluster recovery at center separation .
In contrast, common heuristics such as Lloyd's algorithm (a.k.a. the -means
algorithm) can fail to recover clusters in this setting; even with arbitrarily
large cluster separation, k-means++ with overseeding by any constant factor
fails with high probability at exact cluster recovery. To complement the
theoretical analysis, we provide an experimental study of the recovery
guarantees for these various methods, and discuss several open problems which
these experiments suggest.Comment: 30 pages, ITCS 201
Recommended from our members
Theoretical analysis for convex and non-convex clustering algorithms
Clustering is one of the most important unsupervised learning problem in the machine learning and statistics community. Given a set of observations, the goal is to find the latent cluster assignment of the data points. The observations can be either some covariates corresponding to each data point, or the relational networks representing the affinity between pair of nodes. We study the problem of community detection in stochastic block models and clustering mixture models. The two kinds of problems bear a lot of resemblance, and similar techniques can be applied to solve them.
It is common practice to assume some underlying model for the data generating process in order to analyze it properly. With some pre-defined partitions of all data points, generative models can be defined to represent those two types of data observations. For the covariates, the mixture model is one of the most flexible and widely-used models, where each cluster i comes from some distribution D [subscript i], and the entire distribution is a convex sum over all distributions [mathematical equation]. We assume that the data is Gaussian or sub-gaussian, and analyze two algorithms: 1) Expectation-Maximization algorithm, which is notoriously non-convex and sensitive to local optima, and 2) Convex relaxation of the k-means algorithm. We show both methods are consistent under certain conditions when the signal to noise ratio is relatively high. And we obtain the upper bounds for error rate if the signal to noise ration is low. When there are outliers in the data set, we show that the semi-definite relaxation exhibits more robust result compared to spectral methods.
For the networks, we consider the Stochastic Block Model (SBM), in which the probability of edge presence is fully determined by the cluster assignments of the pair of nodes. We use a semi-definite programming (SDP) relaxation to learn the clustering matrix, and discuss the role of model parameters. In most SDP relaxations of SBM, the number of communities is required for the algorithm, which is a strong requirement for many real-world applications. In this thesis, we propose to introduce a regularization to the nuclear norm, which is shown to be able to exactly recover both the number of communities and cluster memberships even when the number of communities is unknown.
In many real-world networks, it is more common to see both network structure and node covariates simultaneously. In this case, we present a regularization based method to effectively combine the two sources of information. The proposed method works especially well when the covariates and network contain complementary information.Statistic
- …