Search CORE

285 research outputs found

Locally Adaptive Optimization: Adaptive Seeding for Monotone Submodular Functions

Author: Badanidiyuru Ashwinkumar
Papadimitriou Christos
Rubinstein Aviad
Seeman Lior
Singer Yaron
Publication venue
Publication date: 08/07/2015
Field of study

The Adaptive Seeding problem is an algorithmic challenge motivated by influence maximization in social networks: One seeks to select among certain accessible nodes in a network, and then select, adaptively, among neighbors of those nodes as they become accessible in order to maximize a global objective function. More generally, adaptive seeding is a stochastic optimization framework where the choices in the first stage affect the realizations in the second stage, over which we aim to optimize. Our main result is a

(1-1/e)^2

-approximation for the adaptive seeding problem for any monotone submodular function. While adaptive policies are often approximated via non-adaptive policies, our algorithm is based on a novel method we call \emph{locally-adaptive} policies. These policies combine a non-adaptive global structure, with local adaptive optimizations. This method enables the

(1-1/e)^2

-approximation for general monotone submodular functions and circumvents some of the impossibilities associated with non-adaptive policies. We also introduce a fundamental problem in submodular optimization that may be of independent interest: given a ground set of elements where every element appears with some small probability, find a set of expected size at most

k

that has the highest expected value over the realization of the elements. We show a surprising result: there are classes of monotone submodular functions (including coverage) that can be approximated almost optimally as the probability vanishes. For general monotone submodular functions we show via a reduction from \textsc{Planted-Clique} that approximations for this problem are not likely to be obtainable. This optimization problem is an important tool for adaptive seeding via non-adaptive policies, and its hardness motivates the introduction of \emph{locally-adaptive} policies we use in the main result

arXiv.org e-Print Archive

CiteSeerX

Crossref

Combining Traditional Marketing and Viral Marketing with Amphibious Influence Maximization

Author: Chen Wei
Li Fu
Lin Tian
Rubinstein Aviad
Publication venue
Publication date: 13/07/2015
Field of study

In this paper, we propose the amphibious influence maximization (AIM) model that combines traditional marketing via content providers and viral marketing to consumers in social networks in a single framework. In AIM, a set of content providers and consumers form a bipartite network while consumers also form their social network, and influence propagates from the content providers to consumers and among consumers in the social network following the independent cascade model. An advertiser needs to select a subset of seed content providers and a subset of seed consumers, such that the influence from the seed providers passing through the seed consumers could reach a large number of consumers in the social network in expectation. We prove that the AIM problem is NP-hard to approximate to within any constant factor via a reduction from Feige's k-prover proof system for 3-SAT5. We also give evidence that even when the social network graph is trivial (i.e. has no edges), a polynomial time constant factor approximation for AIM is unlikely. However, when we assume that the weighted bi-adjacency matrix that describes the influence of content providers on consumers is of constant rank, a common assumption often used in recommender systems, we provide a polynomial-time algorithm that achieves approximation ratio of

(1-1/e-\epsilon)^3

for any (polynomially small)

\epsilon > 0

. Our algorithmic results still hold for a more general model where cascades in social network follow a general monotone and submodular function.Comment: An extended abstract appeared in the Proceedings of the 16th ACM Conference on Economics and Computation (EC), 201

arXiv.org e-Print Archive

Crossref

Resolution of ranking hierarchies in directed networks

Author: Barucca Paolo
Letizia Elisa
Lillo Fabrizio
Publication venue
Publication date: 04/07/2017
Field of study

Identifying hierarchies and rankings of nodes in directed graphs is fundamental in many applications such as social network analysis, biology, economics, and finance. A recently proposed method identifies the hierarchy by finding the ordered partition of nodes which minimises a score function, termed agony. This function penalises the links violating the hierarchy in a way depending on the strength of the violation. To investigate the resolution of ranking hierarchies we introduce an ensemble of random graphs, the Ranked Stochastic Block Model. We find that agony may fail to identify hierarchies when the structure is not strong enough and the size of the classes is small with respect to the whole network. We analytically characterise the resolution threshold and we show that an iterated version of agony can partly overcome this resolution limit.Comment: 27 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Archivio istituzionale della Ricerca - Scuola Normale Superiore

UCL Discovery

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

FigShare

Relax, no need to round: integrality of clustering formulations

Author: Awasthi Pranjal
Bandeira Afonso S.
Charikar Moses
Krishnaswamy Ravishankar
Villar Soledad
Ward Rachel
Publication venue
Publication date: 01/01/2015
Field of study

We study exact recovery conditions for convex relaxations of point cloud clustering problems, focusing on two of the most common optimization problems for unsupervised clustering:

k

-means and

k

-median clustering. Motivations for focusing on convex relaxations are: (a) they come with a certificate of optimality, and (b) they are generic tools which are relatively parameter-free, not tailored to specific assumptions over the input. More precisely, we consider the distributional setting where there are

k

clusters in

\mathbb{R}^m

and data from each cluster consists of

n

points sampled from a symmetric distribution within a ball of unit radius. We ask: what is the minimal separation distance between cluster centers needed for convex relaxations to exactly recover these

k

clusters as the optimal integral solution? For the

k

-median linear programming relaxation we show a tight bound: exact recovery is obtained given arbitrarily small pairwise separation

\epsilon > 0

between the balls. In other words, the pairwise center separation is

\Delta > 2+\epsilon

. Under the same distributional model, the

k

-means LP relaxation fails to recover such clusters at separation as large as

\Delta = 4

. Yet, if we enforce PSD constraints on the

k

-means LP, we get exact cluster recovery at center separation

\Delta > 2\sqrt2(1+\sqrt{1/m})

. In contrast, common heuristics such as Lloyd's algorithm (a.k.a. the

k

-means algorithm) can fail to recover clusters in this setting; even with arbitrarily large cluster separation, k-means++ with overseeding by any constant factor fails with high probability at exact cluster recovery. To complement the theoretical analysis, we provide an experimental study of the recovery guarantees for these various methods, and discuss several open problems which these experiments suggest.Comment: 30 pages, ITCS 201

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

Recommended from our members

Theoretical analysis for convex and non-convex clustering algorithms

Author: Yan Bowei
Publication venue
Publication date: 04/09/2018
Field of study

Clustering is one of the most important unsupervised learning problem in the machine learning and statistics community. Given a set of observations, the goal is to find the latent cluster assignment of the data points. The observations can be either some covariates corresponding to each data point, or the relational networks representing the affinity between pair of nodes. We study the problem of community detection in stochastic block models and clustering mixture models. The two kinds of problems bear a lot of resemblance, and similar techniques can be applied to solve them. It is common practice to assume some underlying model for the data generating process in order to analyze it properly. With some pre-defined partitions of all data points, generative models can be defined to represent those two types of data observations. For the covariates, the mixture model is one of the most flexible and widely-used models, where each cluster i comes from some distribution D [subscript i], and the entire distribution is a convex sum over all distributions [mathematical equation]. We assume that the data is Gaussian or sub-gaussian, and analyze two algorithms: 1) Expectation-Maximization algorithm, which is notoriously non-convex and sensitive to local optima, and 2) Convex relaxation of the k-means algorithm. We show both methods are consistent under certain conditions when the signal to noise ratio is relatively high. And we obtain the upper bounds for error rate if the signal to noise ration is low. When there are outliers in the data set, we show that the semi-definite relaxation exhibits more robust result compared to spectral methods. For the networks, we consider the Stochastic Block Model (SBM), in which the probability of edge presence is fully determined by the cluster assignments of the pair of nodes. We use a semi-definite programming (SDP) relaxation to learn the clustering matrix, and discuss the role of model parameters. In most SDP relaxations of SBM, the number of communities is required for the algorithm, which is a strong requirement for many real-world applications. In this thesis, we propose to introduce a regularization to the nuclear norm, which is shown to be able to exactly recover both the number of communities and cluster memberships even when the number of communities is unknown. In many real-world networks, it is more common to see both network structure and node covariates simultaneously. In this case, we present a regularization based method to effectively combine the two sources of information. The proposed method works especially well when the covariates and network contain complementary information.Statistic

Texas ScholarWorks