62,930 research outputs found
On Variants of k-means Clustering
\textit{Clustering problems} often arise in the fields like data mining,
machine learning etc. to group a collection of objects into similar groups with
respect to a similarity (or dissimilarity) measure. Among the clustering
problems, specifically \textit{-means} clustering has got much attention
from the researchers. Despite the fact that -means is a very well studied
problem its status in the plane is still an open problem. In particular, it is
unknown whether it admits a PTAS in the plane. The best known approximation
bound in polynomial time is 9+\eps.
In this paper, we consider the following variant of -means. Given a set
of points in and a real , find a finite set of
points in that minimizes the quantity . For any fixed dimension , we design a local
search PTAS for this problem. We also give a "bi-criterion" local search
algorithm for -means which uses (1+\eps)k centers and yields a solution
whose cost is at most (1+\eps) times the cost of an optimal -means
solution. The algorithm runs in polynomial time for any fixed dimension.
The contribution of this paper is two fold. On the one hand, we are being
able to handle the square of distances in an elegant manner, which yields near
optimal approximation bound. This leads us towards a better understanding of
the -means problem. On the other hand, our analysis of local search might
also be useful for other geometric problems. This is important considering that
very little is known about the local search method for geometric approximation.Comment: 15 page
Randomized Dimensionality Reduction for k-means Clustering
We study the topic of dimensionality reduction for -means clustering.
Dimensionality reduction encompasses the union of two approaches: \emph{feature
selection} and \emph{feature extraction}. A feature selection based algorithm
for -means clustering selects a small subset of the input features and then
applies -means clustering on the selected features. A feature extraction
based algorithm for -means clustering constructs a small set of new
artificial features and then applies -means clustering on the constructed
features. Despite the significance of -means clustering as well as the
wealth of heuristic methods addressing it, provably accurate feature selection
methods for -means clustering are not known. On the other hand, two provably
accurate feature extraction methods for -means clustering are known in the
literature; one is based on random projections and the other is based on the
singular value decomposition (SVD).
This paper makes further progress towards a better understanding of
dimensionality reduction for -means clustering. Namely, we present the first
provably accurate feature selection method for -means clustering and, in
addition, we present two feature extraction methods. The first feature
extraction method is based on random projections and it improves upon the
existing results in terms of time complexity and number of features needed to
be extracted. The second feature extraction method is based on fast approximate
SVD factorizations and it also improves upon the existing results in terms of
time complexity. The proposed algorithms are randomized and provide
constant-factor approximation guarantees with respect to the optimal -means
objective value.Comment: IEEE Transactions on Information Theory, to appea
Approximation Algorithms for Bregman Co-clustering and Tensor Clustering
In the past few years powerful generalizations to the Euclidean k-means
problem have been made, such as Bregman clustering [7], co-clustering (i.e.,
simultaneous clustering of rows and columns of an input matrix) [9,18], and
tensor clustering [8,34]. Like k-means, these more general problems also suffer
from the NP-hardness of the associated optimization. Researchers have developed
approximation algorithms of varying degrees of sophistication for k-means,
k-medians, and more recently also for Bregman clustering [2]. However, there
seem to be no approximation algorithms for Bregman co- and tensor clustering.
In this paper we derive the first (to our knowledge) guaranteed methods for
these increasingly important clustering settings. Going beyond Bregman
divergences, we also prove an approximation factor for tensor clustering with
arbitrary separable metrics. Through extensive experiments we evaluate the
characteristics of our method, and show that it also has practical impact.Comment: 18 pages; improved metric cas
A bi-criteria approximation algorithm for Means
We consider the classical -means clustering problem in the setting
bi-criteria approximation, in which an algoithm is allowed to output clusters, and must produce a clustering with cost at most times the
to the cost of the optimal set of clusters. We argue that this approach is
natural in many settings, for which the exact number of clusters is a priori
unknown, or unimportant up to a constant factor. We give new bi-criteria
approximation algorithms, based on linear programming and local search,
respectively, which attain a guarantee depending on the number
of clusters that may be opened. Our gurantee is
always at most and improves rapidly with (for example:
, and ). Moreover, our algorithms have only
polynomial dependence on the dimension of the input data, and so are applicable
in high-dimensional settings
- …