Search CORE

415 research outputs found

On Variants of k-means Clustering

Author: Bandyapadhyay Sayan
Varadarajan Kasturi
Publication venue
Publication date: 09/12/2015
Field of study

\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems, specifically \textit{

k

-means} clustering has got much attention from the researchers. Despite the fact that

k

-means is a very well studied problem its status in the plane is still an open problem. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound in polynomial time is 9+\eps. In this paper, we consider the following variant of

k

-means. Given a set

C

of points in

\mathcal{R}^d

and a real

f > 0

, find a finite set

F

of points in

\mathcal{R}^d

that minimizes the quantity

f*|F|+\sum_{p\in C} \min_{q \in F} {||p-q||}^2

. For any fixed dimension

d

, we design a local search PTAS for this problem. We also give a "bi-criterion" local search algorithm for

k

-means which uses (1+\eps)k centers and yields a solution whose cost is at most (1+\eps) times the cost of an optimal

k

-means solution. The algorithm runs in polynomial time for any fixed dimension. The contribution of this paper is two fold. On the one hand, we are being able to handle the square of distances in an elegant manner, which yields near optimal approximation bound. This leads us towards a better understanding of the

k

-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that very little is known about the local search method for geometric approximation.Comment: 15 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Improvements on the k-center problem for uncertain data

Author: Alipour Sharareh
Jafari Amir
Publication venue
Publication date: 30/08/2017
Field of study

In real applications, there are situations where we need to model some problems based on uncertain data. This leads us to define an uncertain model for some classical geometric optimization problems and propose algorithms to solve them. In this paper, we study the

k

-center problem, for uncertain input. In our setting, each uncertain point

P_i

is located independently from other points in one of several possible locations

\{P_{i,1},\dots, P_{i,z_i}\}

in a metric space with metric

d

, with specified probabilities and the goal is to compute

k

-centers

\{c_1,\dots, c_k\}

that minimize the following expected cost

Ecost(c_1,\dots, c_k)=\sum_{R\in \Omega} prob(R)\max_{i=1,\dots, n}\min_{j=1,\dots k} d(\hat{P}_i,c_j)

here

\Omega

is the probability space of all realizations

R=\{\hat{P}_1,\dots, \hat{P}_n\}

of given uncertain points and

prob(R)=\prod_{i=1}^n prob(\hat{P}_i).

In restricted assigned version of this problem, an assignment

A:\{P_1,\dots, P_n\}\rightarrow \{c_1,\dots, c_k\}

is given for any choice of centers and the goal is to minimize

Ecost_A(c_1,\dots, c_k)=\sum_{R\in \Omega} prob(R)\max_{i=1,\dots, n} d(\hat{P}_i,A(P_i)).

In unrestricted version, the assignment is not specified and the goal is to compute

k

centers

\{c_1,\dots, c_k\}

and an assignment

A

that minimize the above expected cost. We give several improved constant approximation factor algorithms for the assigned versions of this problem in a Euclidean space and in a general metric space. Our results significantly improve the results of \cite{guh} and generalize the results of \cite{wang} to any dimension. Our approach is to replace a certain center point for each uncertain point and study the properties of these certain points. The proposed algorithms are efficient and simple to implement

arXiv.org e-Print Archive

Crossref

Fast Clustering with Lower Bounds: No Customer too Far, No Shop too Small

Author: Ene Alina
Har-Peled Sariel
Raichel Benjamin
Publication venue
Publication date: 26/04/2013
Field of study

We study the \LowerBoundedCenter (\lbc) problem, which is a clustering problem that can be viewed as a variant of the \kCenter problem. In the \lbc problem, we are given a set of points P in a metric space and a lower bound \lambda, and the goal is to select a set C \subseteq P of centers and an assignment that maps each point in P to a center of C such that each center of C is assigned at least \lambda points. The price of an assignment is the maximum distance between a point and the center it is assigned to, and the goal is to find a set of centers and an assignment of minimum price. We give a constant factor approximation algorithm for the \lbc problem that runs in O(n \log n) time when the input points lie in the d-dimensional Euclidean space R^d, where d is a constant. We also prove that this problem cannot be approximated within a factor of 1.8-\epsilon unless P = \NP even if the input points are points in the Euclidean plane R^2.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

The Bane of Low-Dimensionality Clustering

Author: Cohen-Addad Vincent
de Mesmay Arnaud
Rotenberg Eva
Roytman Alan
Publication venue
Publication date: 03/11/2017
Field of study

In this paper, we give a conditional lower bound of

n^{\Omega(k)}

on running time for the classic k-median and k-means clustering objectives (where n is the size of the input), even in low-dimensional Euclidean space of dimension four, assuming the Exponential Time Hypothesis (ETH). We also consider k-median (and k-means) with penalties where each point need not be assigned to a center, in which case it must pay a penalty, and extend our lower bound to at least three-dimensional Euclidean space. This stands in stark contrast to many other geometric problems such as the traveling salesman problem, or computing an independent set of unit spheres. While these problems benefit from the so-called (limited) blessing of dimensionality, as they can be solved in time

n^{O(k^{1-1/d})}

2^{n^{1-1/d}}

in d dimensions, our work shows that widely-used clustering objectives have a lower bound of

n^{\Omega(k)}

, even in dimension four. We complete the picture by considering the two-dimensional case: we show that there is no algorithm that solves the penalized version in time less than

n^{o(\sqrt{k})}

, and provide a matching upper bound of

n^{O(\sqrt{k})}

. The main tool we use to establish these lower bounds is the placement of points on the moment curve, which takes its inspiration from constructions of point sets yielding Delaunay complexes of high complexity

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Copenhagen University Research Information System

Online Research Database In Technology

The Container Selection Problem

Author: Nagarajan Viswanath
Sarpatwar Kanthi K.
Schieber Baruch
Shachnai Hadas
Wolf Joel L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015)
Publication date: 01/01/2015
Field of study

We introduce and study a network resource management problem that is a special case of non-metric k-median, naturally arising in cross platform scheduling and cloud computing. In the continuous d-dimensional container selection problem, we are given a set C of input points in d-dimensional Euclidean space, for some d >= 2, and a budget k. An input point p can be assigned to a "container point" c only if c dominates p in every dimension. The assignment cost is then equal to the L1-norm of the container point. The goal is to find k container points in the d-dimensional space, such that the total assignment cost for all input points is minimized. The discrete variant of the problem has one key distinction, namely, the container points must be chosen from a given set F of points. For the continuous version, we obtain a polynomial time approximation scheme for any fixed dimension d>= 2. On the negative side, we show that the problem is NP-hard for any d>=3. We further show that the discrete version is significantly harder, as it is NP-hard to approximate without violating the budget k in any dimension d>=3. Thus, we focus on obtaining bi-approximation algorithms. For d=2, the bi-approximation guarantee is (1+epsilon,3), i.e., for any epsilon>0, our scheme outputs a solution of size 3k and cost at most (1+epsilon) times the optimum. For fixed d>2, we present a (1+epsilon,O((1/epsilon)log k)) bi-approximation algorithm

Dagstuhl Research Online Publication Server

Separating a Voronoi Diagram via Local Search

Author: Bhattiprolu Vijay V. S. P.
Har-Peled Sariel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 32nd International Symposium on Computational Geometry (SoCG 2016)
Publication date: 01/01/2016
Field of study

Given a set P of n points in R^dwe show how to insert a set Z of O(n^(1-1/d)) additional points, such that P can be broken into two sets P1 and P2of roughly equal size, such that in the Voronoi diagram V(P u Z), the cells of P1 do not touch the cells of P2; that is, Z separates P1 from P2 in the Voronoi diagram (and also in the dual Delaunay triangulation). In addition, given such a partition (P1,P2) of Pwe present an approximation algorithm to compute a minimum size separator realizing this partition. We also present a simple local search algorithm that is a PTAS for approximating the optimal Voronoi partition

Dagstuhl Research Online Publication Server