1,483 research outputs found
Exact algorithms for minimum sum-of-squares clustering
NP-Hardness of Euclidean sum-of-squares clustering -- Computational complexity -- An incorrect reduction from the K-section problem -- A new proof by reduction from the densest cut problem -- Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering -- Reformulation-Linearization technique for the MSSC -- Branch-and-bound for the MSSC -- An attempt at reproducting computational results -- Breaking symmetry and convex hull inequalities -- A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering -- Equivalence of MSSC to 0-1 SDP -- A branch-and cut algorithm for the 0-1 SDP formulation -- Computational experiments -- An improved column generation algorithm for minimum sum-of-squares clustering -- Column generation algorithm revisited -- A geometric approach -- Generalization to the Euclidean space -- Computational results
The Hardness of Approximation of Euclidean k-means
The Euclidean -means problem is a classical problem that has been
extensively studied in the theoretical computer science, machine learning and
the computational geometry communities. In this problem, we are given a set of
points in Euclidean space , and the goal is to choose centers in
so that the sum of squared distances of each point to its nearest center
is minimized. The best approximation algorithms for this problem include a
polynomial time constant factor approximation for general and a
-approximation which runs in time . At
the other extreme, the only known computational complexity result for this
problem is NP-hardness [ADHP'09]. The main difficulty in obtaining hardness
results stems from the Euclidean nature of the problem, and the fact that any
point in can be a potential center. This gap in understanding left open
the intriguing possibility that the problem might admit a PTAS for all .
In this paper we provide the first hardness of approximation for the
Euclidean -means problem. Concretely, we show that there exists a constant
such that it is NP-hard to approximate the -means objective
to within a factor of . We show this via an efficient reduction
from the vertex cover problem on triangle-free graphs: given a triangle-free
graph, the goal is to choose the fewest number of vertices which are incident
on all the edges. Additionally, we give a proof that the current best hardness
results for vertex cover can be carried over to triangle-free graphs. To show
this we transform , a known hard vertex cover instance, by taking a graph
product with a suitably chosen graph , and showing that the size of the
(normalized) maximum independent set is almost exactly preserved in the product
graph using a spectral analysis, which might be of independent interest
Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition
We provide efficient constant factor approximation algorithms for the
problems of finding a hierarchical clustering of a point set in any metric
space, minimizing the sum of minimimum spanning tree lengths within each
cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of
cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can
also be used to provide a pants decomposition, that is, a set of disjoint
simple closed curves partitioning the plane minus the input points into subsets
with exactly three boundary components, with approximately minimum total
length. In the Euclidean case, these curves are squares; in the hyperbolic
case, they combine our Euclidean square pants decomposition with our tree
clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now
Lemma 5.2, as the previous proof was erroneou
Minimum-Cost Coverage of Point Sets by Disks
We consider a class of geometric facility location problems in which the goal
is to determine a set X of disks given by their centers (t_j) and radii (r_j)
that cover a given set of demand points Y in the plane at the smallest possible
cost. We consider cost functions of the form sum_j f(r_j), where f(r)=r^alpha
is the cost of transmission to radius r. Special cases arise for alpha=1 (sum
of radii) and alpha=2 (total area); power consumption models in wireless
network design often use an exponent alpha>2. Different scenarios arise
according to possible restrictions on the transmission centers t_j, which may
be constrained to belong to a given discrete set or to lie on a line, etc. We
obtain several new results, including (a) exact and approximation algorithms
for selecting transmission points t_j on a given line in order to cover demand
points Y in the plane; (b) approximation algorithms (and an algebraic
intractability result) for selecting an optimal line on which to place
transmission points to cover Y; (c) a proof of NP-hardness for a discrete set
of transmission points in the plane and any fixed alpha>1; and (d) a
polynomial-time approximation scheme for the problem of computing a minimum
cost covering tour (MCCT), in which the total cost is a linear combination of
the transmission cost for the set of disks and the length of a tour/path that
connects the centers of the disks.Comment: 10 pages, 4 figures, Latex, to appear in ACM Symposium on
Computational Geometry 200
Fast Clustering with Lower Bounds: No Customer too Far, No Shop too Small
We study the \LowerBoundedCenter (\lbc) problem, which is a clustering
problem that can be viewed as a variant of the \kCenter problem. In the \lbc
problem, we are given a set of points P in a metric space and a lower bound
\lambda, and the goal is to select a set C \subseteq P of centers and an
assignment that maps each point in P to a center of C such that each center of
C is assigned at least \lambda points. The price of an assignment is the
maximum distance between a point and the center it is assigned to, and the goal
is to find a set of centers and an assignment of minimum price. We give a
constant factor approximation algorithm for the \lbc problem that runs in O(n
\log n) time when the input points lie in the d-dimensional Euclidean space
R^d, where d is a constant. We also prove that this problem cannot be
approximated within a factor of 1.8-\epsilon unless P = \NP even if the input
points are points in the Euclidean plane R^2.Comment: 14 page
- …