9 research outputs found
Approximation Algorithms for Bregman Co-clustering and Tensor Clustering
In the past few years powerful generalizations to the Euclidean k-means
problem have been made, such as Bregman clustering [7], co-clustering (i.e.,
simultaneous clustering of rows and columns of an input matrix) [9,18], and
tensor clustering [8,34]. Like k-means, these more general problems also suffer
from the NP-hardness of the associated optimization. Researchers have developed
approximation algorithms of varying degrees of sophistication for k-means,
k-medians, and more recently also for Bregman clustering [2]. However, there
seem to be no approximation algorithms for Bregman co- and tensor clustering.
In this paper we derive the first (to our knowledge) guaranteed methods for
these increasingly important clustering settings. Going beyond Bregman
divergences, we also prove an approximation factor for tensor clustering with
arbitrary separable metrics. Through extensive experiments we evaluate the
characteristics of our method, and show that it also has practical impact.Comment: 18 pages; improved metric cas
A simple D^2-sampling based PTAS for k-means and other Clustering Problems
Given a set of points , the -means clustering
problem is to find a set of {\em centers} such that the objective function ,
where denotes the distance between and the closest center in ,
is minimized. This is one of the most prominent objective functions that have
been studied with respect to clustering.
-sampling \cite{ArthurV07} is a simple non-uniform sampling technique
for choosing points from a set of points. It works as follows: given a set of
points , the first point is chosen uniformly at
random from . Subsequently, a point from is chosen as the next sample
with probability proportional to the square of the distance of this point to
the nearest previously sampled points.
-sampling has been shown to have nice properties with respect to the
-means clustering problem. Arthur and Vassilvitskii \cite{ArthurV07} show
that points chosen as centers from using -sampling gives an
approximation in expectation. Ailon et. al. \cite{AJMonteleoni09}
and Aggarwal et. al. \cite{AggarwalDK09} extended results of \cite{ArthurV07}
to show that points chosen as centers using -sampling give
approximation to the -means objective function with high probability. In
this paper, we further demonstrate the power of -sampling by giving a
simple randomized -approximation algorithm that uses the
-sampling in its core