Search CORE

211,376 research outputs found

Approximate kernel clustering

Author: Khot Subhash
Naor Assaf
Publication venue
Publication date: 01/01/2008
Field of study

In the kernel clustering problem we are given a large

n\times n

positive semi-definite matrix

A=(a_{ij})

with

\sum_{i,j=1}^na_{ij}=0

and a small

k\times k

positive semi-definite matrix

B=(b_{ij})

. The goal is to find a partition

S_1,...,S_k

\{1,... n\}

which maximizes the quantity

\sum_{i,j=1}^k (\sum_{(i,j)\in S_i\times S_j}a_{ij})b_{ij}.

We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song, Smola, Gretton and Borgwardt. In some cases we manage to compute the sharp approximation threshold for this problem assuming the Unique Games Conjecture (UGC). In particular, when

B

is the

3\times 3

identity matrix the UGC hardness threshold of this problem is exactly

\frac{16\pi}{27}

. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when

B

is the

k\times k

identity matrix is

\frac{8\pi}{9}(1-\frac{1}{k})

for every

k\ge 3

arXiv.org e-Print Archive

CiteSeerX

Approximate Clustering via Metric Partitioning

Author: Bandyapadhyay Sayan
Varadarajan Kasturi
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we consider two metric covering/clustering problems - \textit{Minimum Cost Covering Problem} (MCC) and

k

-clustering. In the MCC problem, we are given two point sets

X

(clients) and

Y

(servers), and a metric on

X \cup Y

. We would like to cover the clients by balls centered at the servers. The objective function to minimize is the sum of the

\alpha

-th power of the radii of the balls. Here

\alpha \geq 1

is a parameter of the problem (but not of a problem instance). MCC is closely related to the

k

-clustering problem. The main difference between

k

-clustering and MCC is that in

k

-clustering one needs to select

k

balls to cover the clients. For any \eps > 0, we describe quasi-polynomial time (1 + \eps) approximation algorithms for both of the problems. However, in case of

k

-clustering the algorithm uses (1 + \eps)k balls. Prior to our work, a

3^{\alpha}

and a

{c}^{\alpha}

approximation were achieved by polynomial-time algorithms for MCC and

k

-clustering, respectively, where

c > 1

is an absolute constant. These two problems are thus interesting examples of metric covering/clustering problems that admit (1 + \eps)-approximation (using (1+\eps)k balls in case of

k

-clustering), if one is willing to settle for quasi-polynomial time. In contrast, for the variant of MCC where

\alpha

is part of the input, we show under standard assumptions that no polynomial time algorithm can achieve an approximation factor better than

O(\log |X|)

for

\alpha \geq \log |X|

.Comment: 19 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Large Scale Spectral Clustering Using Approximate Commute Time Embedding

Author: C. Fowlkes
D. Achlioptas
D. Mavroeidis
D.A. Spielman
F. Fouss
H. Qiu
I. Koutis
L. Wang
P.G. Doyle
U. von Luxburg
W.Y. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to

O(n^3)

and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that the proposed approach has better clustering quality and is faster than the state-of-the-art approximate spectral clustering methods

arXiv.org e-Print Archive

Crossref

Fast Approximate Spectral Clustering for Dynamic Networks

Author: Loukas Andreas
Martin Lionel
Vandergheynst Pierre
Publication venue
Publication date: 12/06/2017
Field of study

Spectral clustering is a widely studied problem, yet its complexity is prohibitive for dynamic graphs of even modest size. We claim that it is possible to reuse information of past cluster assignments to expedite computation. Our approach builds on a recent idea of sidestepping the main bottleneck of spectral clustering, i.e., computing the graph eigenvectors, by using fast Chebyshev graph filtering of random signals. We show that the proposed algorithm achieves clustering assignments with quality approximating that of spectral clustering and that it can yield significant complexity benefits when the graph dynamics are appropriately bounded

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly

Author: Ahmed Amr
Beutel Alex
Smola Alexander J.
Publication venue
Publication date: 31/12/2014
Field of study

Matrix completion and approximation are popular tools to capture a user's preferences for recommendation and to approximate missing data. Instead of using low-rank factorization we take a drastically different approach, based on the simple insight that an additive model of co-clusterings allows one to approximate matrices efficiently. This allows us to build a concise model that, per bit of model learned, significantly beats all factorization approaches to matrix approximation. Even more surprisingly, we find that summing over small co-clusterings is more effective in modeling matrices than classic co-clustering, which uses just one large partitioning of the matrix. Following Occam's razor principle suggests that the simple structure induced by our model better captures the latent preferences and decision making processes present in the real world than classic co-clustering or matrix factorization. We provide an iterative minimization algorithm, a collapsed Gibbs sampler, theoretical guarantees for matrix approximation, and excellent empirical evidence for the efficacy of our approach. We achieve state-of-the-art results on the Netflix problem with a fraction of the model complexity.Comment: 22 pages, under review for conference publicatio

arXiv.org e-Print Archive

CiteSeerX