415 research outputs found

    On Variants of k-means Clustering

    Get PDF
    \textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems, specifically \textit{kk-means} clustering has got much attention from the researchers. Despite the fact that kk-means is a very well studied problem its status in the plane is still an open problem. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound in polynomial time is 9+\eps. In this paper, we consider the following variant of kk-means. Given a set CC of points in Rd\mathcal{R}^d and a real f>0f > 0, find a finite set FF of points in Rd\mathcal{R}^d that minimizes the quantity fF+pCminqFpq2f*|F|+\sum_{p\in C} \min_{q \in F} {||p-q||}^2. For any fixed dimension dd, we design a local search PTAS for this problem. We also give a "bi-criterion" local search algorithm for kk-means which uses (1+\eps)k centers and yields a solution whose cost is at most (1+\eps) times the cost of an optimal kk-means solution. The algorithm runs in polynomial time for any fixed dimension. The contribution of this paper is two fold. On the one hand, we are being able to handle the square of distances in an elegant manner, which yields near optimal approximation bound. This leads us towards a better understanding of the kk-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that very little is known about the local search method for geometric approximation.Comment: 15 page

    Improvements on the k-center problem for uncertain data

    Full text link
    In real applications, there are situations where we need to model some problems based on uncertain data. This leads us to define an uncertain model for some classical geometric optimization problems and propose algorithms to solve them. In this paper, we study the kk-center problem, for uncertain input. In our setting, each uncertain point PiP_i is located independently from other points in one of several possible locations {Pi,1,,Pi,zi}\{P_{i,1},\dots, P_{i,z_i}\} in a metric space with metric dd, with specified probabilities and the goal is to compute kk-centers {c1,,ck}\{c_1,\dots, c_k\} that minimize the following expected cost Ecost(c1,,ck)=RΩprob(R)maxi=1,,nminj=1,kd(P^i,cj)Ecost(c_1,\dots, c_k)=\sum_{R\in \Omega} prob(R)\max_{i=1,\dots, n}\min_{j=1,\dots k} d(\hat{P}_i,c_j) here Ω\Omega is the probability space of all realizations R={P^1,,P^n}R=\{\hat{P}_1,\dots, \hat{P}_n\} of given uncertain points and prob(R)=i=1nprob(P^i).prob(R)=\prod_{i=1}^n prob(\hat{P}_i). In restricted assigned version of this problem, an assignment A:{P1,,Pn}{c1,,ck}A:\{P_1,\dots, P_n\}\rightarrow \{c_1,\dots, c_k\} is given for any choice of centers and the goal is to minimize EcostA(c1,,ck)=RΩprob(R)maxi=1,,nd(P^i,A(Pi)).Ecost_A(c_1,\dots, c_k)=\sum_{R\in \Omega} prob(R)\max_{i=1,\dots, n} d(\hat{P}_i,A(P_i)). In unrestricted version, the assignment is not specified and the goal is to compute kk centers {c1,,ck}\{c_1,\dots, c_k\} and an assignment AA that minimize the above expected cost. We give several improved constant approximation factor algorithms for the assigned versions of this problem in a Euclidean space and in a general metric space. Our results significantly improve the results of \cite{guh} and generalize the results of \cite{wang} to any dimension. Our approach is to replace a certain center point for each uncertain point and study the properties of these certain points. The proposed algorithms are efficient and simple to implement

    Fast Clustering with Lower Bounds: No Customer too Far, No Shop too Small

    Full text link
    We study the \LowerBoundedCenter (\lbc) problem, which is a clustering problem that can be viewed as a variant of the \kCenter problem. In the \lbc problem, we are given a set of points P in a metric space and a lower bound \lambda, and the goal is to select a set C \subseteq P of centers and an assignment that maps each point in P to a center of C such that each center of C is assigned at least \lambda points. The price of an assignment is the maximum distance between a point and the center it is assigned to, and the goal is to find a set of centers and an assignment of minimum price. We give a constant factor approximation algorithm for the \lbc problem that runs in O(n \log n) time when the input points lie in the d-dimensional Euclidean space R^d, where d is a constant. We also prove that this problem cannot be approximated within a factor of 1.8-\epsilon unless P = \NP even if the input points are points in the Euclidean plane R^2.Comment: 14 page

    The Bane of Low-Dimensionality Clustering

    Get PDF
    In this paper, we give a conditional lower bound of nΩ(k)n^{\Omega(k)} on running time for the classic k-median and k-means clustering objectives (where n is the size of the input), even in low-dimensional Euclidean space of dimension four, assuming the Exponential Time Hypothesis (ETH). We also consider k-median (and k-means) with penalties where each point need not be assigned to a center, in which case it must pay a penalty, and extend our lower bound to at least three-dimensional Euclidean space. This stands in stark contrast to many other geometric problems such as the traveling salesman problem, or computing an independent set of unit spheres. While these problems benefit from the so-called (limited) blessing of dimensionality, as they can be solved in time nO(k11/d)n^{O(k^{1-1/d})} or 2n11/d2^{n^{1-1/d}} in d dimensions, our work shows that widely-used clustering objectives have a lower bound of nΩ(k)n^{\Omega(k)}, even in dimension four. We complete the picture by considering the two-dimensional case: we show that there is no algorithm that solves the penalized version in time less than no(k)n^{o(\sqrt{k})}, and provide a matching upper bound of nO(k)n^{O(\sqrt{k})}. The main tool we use to establish these lower bounds is the placement of points on the moment curve, which takes its inspiration from constructions of point sets yielding Delaunay complexes of high complexity

    The Container Selection Problem

    Get PDF
    We introduce and study a network resource management problem that is a special case of non-metric k-median, naturally arising in cross platform scheduling and cloud computing. In the continuous d-dimensional container selection problem, we are given a set C of input points in d-dimensional Euclidean space, for some d >= 2, and a budget k. An input point p can be assigned to a "container point" c only if c dominates p in every dimension. The assignment cost is then equal to the L1-norm of the container point. The goal is to find k container points in the d-dimensional space, such that the total assignment cost for all input points is minimized. The discrete variant of the problem has one key distinction, namely, the container points must be chosen from a given set F of points. For the continuous version, we obtain a polynomial time approximation scheme for any fixed dimension d>= 2. On the negative side, we show that the problem is NP-hard for any d>=3. We further show that the discrete version is significantly harder, as it is NP-hard to approximate without violating the budget k in any dimension d>=3. Thus, we focus on obtaining bi-approximation algorithms. For d=2, the bi-approximation guarantee is (1+epsilon,3), i.e., for any epsilon>0, our scheme outputs a solution of size 3k and cost at most (1+epsilon) times the optimum. For fixed d>2, we present a (1+epsilon,O((1/epsilon)log k)) bi-approximation algorithm

    Separating a Voronoi Diagram via Local Search

    Get PDF
    Given a set P of n points in R^dwe show how to insert a set Z of O(n^(1-1/d)) additional points, such that P can be broken into two sets P1 and P2of roughly equal size, such that in the Voronoi diagram V(P u Z), the cells of P1 do not touch the cells of P2; that is, Z separates P1 from P2 in the Voronoi diagram (and also in the dual Delaunay triangulation). In addition, given such a partition (P1,P2) of Pwe present an approximation algorithm to compute a minimum size separator realizing this partition. We also present a simple local search algorithm that is a PTAS for approximating the optimal Voronoi partition
    corecore