9 research outputs found

    Approximation Algorithms for Bregman Co-clustering and Tensor Clustering

    Full text link
    In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.Comment: 18 pages; improved metric cas

    A simple D^2-sampling based PTAS for k-means and other Clustering Problems

    Full text link
    Given a set of points PRdP \subset \mathbb{R}^d, the kk-means clustering problem is to find a set of kk {\em centers} C={c1,...,ck},ciRd,C = \{c_1,...,c_k\}, c_i \in \mathbb{R}^d, such that the objective function xPd(x,C)2\sum_{x \in P} d(x,C)^2, where d(x,C)d(x,C) denotes the distance between xx and the closest center in CC, is minimized. This is one of the most prominent objective functions that have been studied with respect to clustering. D2D^2-sampling \cite{ArthurV07} is a simple non-uniform sampling technique for choosing points from a set of points. It works as follows: given a set of points PRdP \subseteq \mathbb{R}^d, the first point is chosen uniformly at random from PP. Subsequently, a point from PP is chosen as the next sample with probability proportional to the square of the distance of this point to the nearest previously sampled points. D2D^2-sampling has been shown to have nice properties with respect to the kk-means clustering problem. Arthur and Vassilvitskii \cite{ArthurV07} show that kk points chosen as centers from PP using D2D^2-sampling gives an O(logk)O(\log{k}) approximation in expectation. Ailon et. al. \cite{AJMonteleoni09} and Aggarwal et. al. \cite{AggarwalDK09} extended results of \cite{ArthurV07} to show that O(k)O(k) points chosen as centers using D2D^2-sampling give O(1)O(1) approximation to the kk-means objective function with high probability. In this paper, we further demonstrate the power of D2D^2-sampling by giving a simple randomized (1+ϵ)(1 + \epsilon)-approximation algorithm that uses the D2D^2-sampling in its core

    Coresets and Approximate Clustering for Bregman Divergences

    No full text
    corecore