Search CORE

71 research outputs found

Guaranteed clustering and biclustering via semidefinite programming

Author: A Ng
B Ames
B Recht
B Recht
Brendan P. W. Ames
D Aloise
D Donoho
D Gross
E Berg Van Den
E Birgin
E Candès
E Candès
E Candès
G Van Golub
J Peng
K Rohe
L Tunçel
R Kannan
R Shamir
RT Rockafellar
S Balakrishnan
S Boyd
S Boyd
S Busygin
S Geman
V Singh
W Hoeffding
Z Füredi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Identifying clusters of similar objects in data plays a significant role in a wide range of applications. As a model problem for clustering, we consider the densest k-disjoint-clique problem, whose goal is to identify the collection of k disjoint cliques of a given weighted complete graph maximizing the sum of the densities of the complete subgraphs induced by these cliques. In this paper, we establish conditions ensuring exact recovery of the densest k cliques of a given graph from the optimal solution of a particular semidefinite program. In particular, the semidefinite relaxation is exact for input graphs corresponding to data consisting of k large, distinct clusters and a smaller number of outliers. This approach also yields a semidefinite relaxation for the biclustering problem with similar recovery guarantees. Given a set of objects and a set of features exhibited by these objects, biclustering seeks to simultaneously group the objects and features according to their expression levels. This problem may be posed as partitioning the nodes of a weighted bipartite complete graph such that the sum of the densities of the resulting bipartite complete subgraphs is maximized. As in our analysis of the densest k-disjoint-clique problem, we show that the correct partition of the objects and features can be recovered from the optimal solution of a semidefinite program in the case that the given data consists of several disjoint sets of objects exhibiting similar features. Empirical evidence from numerical experiments supporting these theoretical guarantees is also provided

arXiv.org e-Print Archive

CiteSeerX

Crossref

Caltech Authors

Tightness of the maximum likelihood semidefinite relaxation for angular synchronization

Author: Bandeira Afonso S.
Boumal Nicolas
Singer Amit
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Maximum likelihood estimation problems are, in general, intractable optimization problems. As a result, it is common to approximate the maximum likelihood estimator (MLE) using convex relaxations. In some cases, the relaxation is tight: it recovers the true MLE. Most tightness proofs only apply to situations where the MLE exactly recovers a planted solution (known to the analyst). It is then sufficient to establish that the optimality conditions hold at the planted signal. In this paper, we study an estimation problem (angular synchronization) for which the MLE is not a simple function of the planted solution, yet for which the convex relaxation is tight. To establish tightness in this context, the proof is less direct because the point at which to verify optimality conditions is not known explicitly. Angular synchronization consists in estimating a collection of

n

phases, given noisy measurements of the pairwise relative phases. The MLE for angular synchronization is the solution of a (hard) non-bipartite Grothendieck problem over the complex numbers. We consider a stochastic model for the data: a planted signal (that is, a ground truth set of phases) is corrupted with non-adversarial random noise. Even though the MLE does not coincide with the planted signal, we show that the classical semidefinite relaxation for it is tight, with high probability. This holds even for high levels of noise.Comment: 2 figure

arXiv.org e-Print Archive

Princeton University Open Access Repository

CiteSeerX

INRIA a CCSD electronic archive server

Exact Clustering of Weighted Graphs via Semidefinite Programming

Author: Ames Brendan
Pirinen Aleksis
Publication venue
Publication date: 01/01/2019
Field of study

As a model problem for clustering, we consider the densest k-disjoint-clique problem of partitioning a weighted complete graph into k disjoint subgraphs such that the sum of the densities of these subgraphs is maximized. We establish that such subgraphs can be recovered from the solution of a particular semidefinite relaxation with high probability if the input graph is sampled from a distribution of clusterable graphs. Specifically, the semidefinite relaxation is exact if the graph consists of k large disjoint subgraphs, corresponding to clusters, with weight concentrated within these subgraphs, plus a moderate number of outliers. Further, we establish that if noise is weakly obscuring these clusters, i.e, the between-cluster edges are assigned very small weights, then we can recover significantly smaller clusters. For example, we show that in approximately sparse graphs, where the between-cluster weights tend to zero as the size n of the graph tends to infinity, we can recover clusters of size polylogarithmic in n. Empirical evidence from numerical simulations is also provided to support these theoretical phase transitions to perfect recovery of the cluster structure

arXiv.org e-Print Archive

Lund University Publications

Relax, no need to round: integrality of clustering formulations

Author: Awasthi Pranjal
Bandeira Afonso S.
Charikar Moses
Krishnaswamy Ravishankar
Villar Soledad
Ward Rachel
Publication venue
Publication date: 14/04/2015
Field of study

We study exact recovery conditions for convex relaxations of point cloud clustering problems, focusing on two of the most common optimization problems for unsupervised clustering:

k

-means and

k

-median clustering. Motivations for focusing on convex relaxations are: (a) they come with a certificate of optimality, and (b) they are generic tools which are relatively parameter-free, not tailored to specific assumptions over the input. More precisely, we consider the distributional setting where there are

k

clusters in

\mathbb{R}^m

and data from each cluster consists of

n

points sampled from a symmetric distribution within a ball of unit radius. We ask: what is the minimal separation distance between cluster centers needed for convex relaxations to exactly recover these

k

clusters as the optimal integral solution? For the

k

-median linear programming relaxation we show a tight bound: exact recovery is obtained given arbitrarily small pairwise separation

\epsilon > 0

between the balls. In other words, the pairwise center separation is

\Delta > 2+\epsilon

. Under the same distributional model, the

k

-means LP relaxation fails to recover such clusters at separation as large as

\Delta = 4

. Yet, if we enforce PSD constraints on the

k

-means LP, we get exact cluster recovery at center separation

\Delta > 2\sqrt2(1+\sqrt{1/m})

. In contrast, common heuristics such as Lloyd's algorithm (a.k.a. the

k

-means algorithm) can fail to recover clusters in this setting; even with arbitrarily large cluster separation, k-means++ with overseeding by any constant factor fails with high probability at exact cluster recovery. To complement the theoretical analysis, we provide an experimental study of the recovery guarantees for these various methods, and discuss several open problems which these experiments suggest.Comment: 30 pages, ITCS 201

arXiv.org e-Print Archive

CiteSeerX