62,917 research outputs found

    On Variants of k-means Clustering

    Get PDF
    \textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems, specifically \textit{kk-means} clustering has got much attention from the researchers. Despite the fact that kk-means is a very well studied problem its status in the plane is still an open problem. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound in polynomial time is 9+\eps. In this paper, we consider the following variant of kk-means. Given a set CC of points in Rd\mathcal{R}^d and a real f>0f > 0, find a finite set FF of points in Rd\mathcal{R}^d that minimizes the quantity fF+pCminqFpq2f*|F|+\sum_{p\in C} \min_{q \in F} {||p-q||}^2. For any fixed dimension dd, we design a local search PTAS for this problem. We also give a "bi-criterion" local search algorithm for kk-means which uses (1+\eps)k centers and yields a solution whose cost is at most (1+\eps) times the cost of an optimal kk-means solution. The algorithm runs in polynomial time for any fixed dimension. The contribution of this paper is two fold. On the one hand, we are being able to handle the square of distances in an elegant manner, which yields near optimal approximation bound. This leads us towards a better understanding of the kk-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that very little is known about the local search method for geometric approximation.Comment: 15 page

    Randomized Dimensionality Reduction for k-means Clustering

    Full text link
    We study the topic of dimensionality reduction for kk-means clustering. Dimensionality reduction encompasses the union of two approaches: \emph{feature selection} and \emph{feature extraction}. A feature selection based algorithm for kk-means clustering selects a small subset of the input features and then applies kk-means clustering on the selected features. A feature extraction based algorithm for kk-means clustering constructs a small set of new artificial features and then applies kk-means clustering on the constructed features. Despite the significance of kk-means clustering as well as the wealth of heuristic methods addressing it, provably accurate feature selection methods for kk-means clustering are not known. On the other hand, two provably accurate feature extraction methods for kk-means clustering are known in the literature; one is based on random projections and the other is based on the singular value decomposition (SVD). This paper makes further progress towards a better understanding of dimensionality reduction for kk-means clustering. Namely, we present the first provably accurate feature selection method for kk-means clustering and, in addition, we present two feature extraction methods. The first feature extraction method is based on random projections and it improves upon the existing results in terms of time complexity and number of features needed to be extracted. The second feature extraction method is based on fast approximate SVD factorizations and it also improves upon the existing results in terms of time complexity. The proposed algorithms are randomized and provide constant-factor approximation guarantees with respect to the optimal kk-means objective value.Comment: IEEE Transactions on Information Theory, to appea

    Approximation Algorithms for Bregman Co-clustering and Tensor Clustering

    Full text link
    In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.Comment: 18 pages; improved metric cas

    A bi-criteria approximation algorithm for kk Means

    Get PDF
    We consider the classical kk-means clustering problem in the setting bi-criteria approximation, in which an algoithm is allowed to output βk>k\beta k > k clusters, and must produce a clustering with cost at most α\alpha times the to the cost of the optimal set of kk clusters. We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to a constant factor. We give new bi-criteria approximation algorithms, based on linear programming and local search, respectively, which attain a guarantee α(β)\alpha(\beta) depending on the number βk\beta k of clusters that may be opened. Our gurantee α(β)\alpha(\beta) is always at most 9+ϵ9 + \epsilon and improves rapidly with β\beta (for example: α(2)<2.59\alpha(2)<2.59, and α(3)<1.4\alpha(3) < 1.4). Moreover, our algorithms have only polynomial dependence on the dimension of the input data, and so are applicable in high-dimensional settings
    corecore