Search CORE

1,483 research outputs found

Exact algorithms for minimum sum-of-squares clustering

Author: Aloise Daniel
Publication venue
Publication date: 01/01/2009
Field of study

NP-Hardness of Euclidean sum-of-squares clustering -- Computational complexity -- An incorrect reduction from the K-section problem -- A new proof by reduction from the densest cut problem -- Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering -- Reformulation-Linearization technique for the MSSC -- Branch-and-bound for the MSSC -- An attempt at reproducting computational results -- Breaking symmetry and convex hull inequalities -- A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering -- Equivalence of MSSC to 0-1 SDP -- A branch-and cut algorithm for the 0-1 SDP formulation -- Computational experiments -- An improved column generation algorithm for minimum sum-of-squares clustering -- Column generation algorithm revisited -- A geometric approach -- Generalization to the Euclidean space -- Computational results

PolyPublie

The Hardness of Approximation of Euclidean k-means

Author: Awasthi Pranjal
Charikar Moses
Krishnaswamy Ravishankar
Sinop Ali Kemal
Publication venue
Publication date: 01/01/2015
Field of study

The Euclidean

k

-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of

n

points in Euclidean space

R^d

, and the goal is to choose

k

centers in

R^d

so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general

k

and a

(1+\epsilon)

-approximation which runs in time

poly(n) 2^{O(k/\epsilon)}

. At the other extreme, the only known computational complexity result for this problem is NP-hardness [ADHP'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in

R^d

can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all

k,d

. In this paper we provide the first hardness of approximation for the Euclidean

k

-means problem. Concretely, we show that there exists a constant

\epsilon > 0

such that it is NP-hard to approximate the

k

-means objective to within a factor of

(1+\epsilon)

. We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform

G

, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph

H

, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Dagstuhl Research Online Publication Server

Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition

Author: Alstrup S.
Aluru S.
Bern M. W.
David Eppstein
Erickson J.
Saitou N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2008
Field of study

We provide efficient constant factor approximation algorithms for the problems of finding a hierarchical clustering of a point set in any metric space, minimizing the sum of minimimum spanning tree lengths within each cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can also be used to provide a pants decomposition, that is, a set of disjoint simple closed curves partitioning the plane minus the input points into subsets with exactly three boundary components, with approximately minimum total length. In the Euclidean case, these curves are squares; in the hyperbolic case, they combine our Euclidean square pants decomposition with our tree clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now Lemma 5.2, as the previous proof was erroneou

arXiv.org e-Print Archive

Crossref

Minimum-Cost Coverage of Point Sets by Disks

Author: Christian Knauer
Esther M. Arkin
Helmut Alt
Hervé Brönnimann
Jeff Erickson
Jonathan Lenchner
Joseph S. B. Mitchell
Kim Whittlesey
Sándor P. Fekete
Publication venue
Publication date: 01/01/2006
Field of study

We consider a class of geometric facility location problems in which the goal is to determine a set X of disks given by their centers (t_j) and radii (r_j) that cover a given set of demand points Y in the plane at the smallest possible cost. We consider cost functions of the form sum_j f(r_j), where f(r)=r^alpha is the cost of transmission to radius r. Special cases arise for alpha=1 (sum of radii) and alpha=2 (total area); power consumption models in wireless network design often use an exponent alpha>2. Different scenarios arise according to possible restrictions on the transmission centers t_j, which may be constrained to belong to a given discrete set or to lie on a line, etc. We obtain several new results, including (a) exact and approximation algorithms for selecting transmission points t_j on a given line in order to cover demand points Y in the plane; (b) approximation algorithms (and an algebraic intractability result) for selecting an optimal line on which to place transmission points to cover Y; (c) a proof of NP-hardness for a discrete set of transmission points in the plane and any fixed alpha>1; and (d) a polynomial-time approximation scheme for the problem of computing a minimum cost covering tour (MCCT), in which the total cost is a linear combination of the transmission cost for the set of disks and the length of a tour/path that connects the centers of the disks.Comment: 10 pages, 4 figures, Latex, to appear in ACM Symposium on Computational Geometry 200

arXiv.org e-Print Archive

CiteSeerX

Fast Clustering with Lower Bounds: No Customer too Far, No Shop too Small

Author: Ene Alina
Har-Peled Sariel
Raichel Benjamin
Publication venue
Publication date: 26/04/2013
Field of study

We study the \LowerBoundedCenter (\lbc) problem, which is a clustering problem that can be viewed as a variant of the \kCenter problem. In the \lbc problem, we are given a set of points P in a metric space and a lower bound \lambda, and the goal is to select a set C \subseteq P of centers and an assignment that maps each point in P to a center of C such that each center of C is assigned at least \lambda points. The price of an assignment is the maximum distance between a point and the center it is assigned to, and the goal is to find a set of centers and an assignment of minimum price. We give a constant factor approximation algorithm for the \lbc problem that runs in O(n \log n) time when the input points lie in the d-dimensional Euclidean space R^d, where d is a constant. We also prove that this problem cannot be approximated within a factor of 1.8-\epsilon unless P = \NP even if the input points are points in the Euclidean plane R^2.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX