Search CORE

9 research outputs found

Approximation Algorithms for Bregman Co-clustering and Tensor Clustering

Author: Banerjee Arindam
Jegelka Stefanie
Sra Suvrit
Publication venue
Publication date: 01/01/2008
Field of study

In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.Comment: 18 pages; improved metric cas

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

A simple D^2-sampling based PTAS for k-means and other Clustering Problems

Author: Jaiswal Ragesh
Kumar Amit
Sen Sandeep
Publication venue
Publication date: 20/01/2012
Field of study

Given a set of points

P \subset \mathbb{R}^d

, the

k

-means clustering problem is to find a set of

k

{\em centers}

C = \{c_1,...,c_k\}, c_i \in \mathbb{R}^d,

such that the objective function

\sum_{x \in P} d(x,C)^2

, where

d(x,C)

denotes the distance between

x

and the closest center in

C

, is minimized. This is one of the most prominent objective functions that have been studied with respect to clustering.

D^2

-sampling \cite{ArthurV07} is a simple non-uniform sampling technique for choosing points from a set of points. It works as follows: given a set of points

P \subseteq \mathbb{R}^d

, the first point is chosen uniformly at random from

P

. Subsequently, a point from

P

is chosen as the next sample with probability proportional to the square of the distance of this point to the nearest previously sampled points.

D^2

-sampling has been shown to have nice properties with respect to the

k

-means clustering problem. Arthur and Vassilvitskii \cite{ArthurV07} show that

k

points chosen as centers from

P

using

D^2

-sampling gives an

O(\log{k})

approximation in expectation. Ailon et. al. \cite{AJMonteleoni09} and Aggarwal et. al. \cite{AggarwalDK09} extended results of \cite{ArthurV07} to show that