Search CORE

10,428 research outputs found

Approximation Schemes for Min-Sum k-Clustering

Author: Naderi Ismail
Rezapour Mohsen
Salavatipour Mohammad R.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

We consider the Min-Sum k-Clustering (k-MSC) problem. Given a set of points in a metric which is represented by an edge-weighted graph G = (V, E) and a parameter k, the goal is to partition the points V into k clusters such that the sum of distances between all pairs of the points within the same cluster is minimized. The k-MSC problem is known to be APX-hard on general metrics. The best known approximation algorithms for the problem obtained by Behsaz, Friggstad, Salavatipour and Sivakumar [Algorithmica 2019] achieve an approximation ratio of O(log |V|) in polynomial time for general metrics and an approximation ratio 2+? in quasi-polynomial time for metrics with bounded doubling dimension. No approximation schemes for k-MSC (when k is part of the input) is known for any non-trivial metrics prior to our work. In fact, most of the previous works rely on the simple fact that there is a 2-approximate reduction from k-MSC to the balanced k-median problem and design approximation algorithms for the latter to obtain an approximation for k-MSC. In this paper, we obtain the first Quasi-Polynomial Time Approximation Schemes (QPTAS) for the problem on metrics induced by graphs of bounded treewidth, graphs of bounded highway dimension, graphs of bounded doubling dimensions (including fixed dimensional Euclidean metrics), and planar and minor-free graphs. We bypass the barrier of 2 for k-MSC by introducing a new clustering problem, which we call min-hub clustering, which is a generalization of balanced k-median and is a trade off between center-based clustering problems (such as balanced k-median) and pair-wise clustering (such as Min-Sum k-clustering). We then show how one can find approximation schemes for Min-hub clustering on certain classes of metrics

Dagstuhl Research Online Publication Server

Approximation Algorithms for Min-Sum k-Clustering and Balanced k-Median

Author: A Czumaj
N Guttman-Beck
S Sahni
V Arya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/12/2015
Field of study

We consider two closely related fundamental clustering problems in this paper. In the min-sum k-clustering one is given a metric space and has to partition the points into k clusters while minimizing the sum of pairwise distances between the points within the clusters. In the Balanced k-Median problem the instance is the same and one has to obtain a clustering into k cluster C1,..., Ck, where each cluster Ci has a center ci, while minimizing the total assignment costs for the points in the metric; here the cost of assigning a point j to a cluster Ci is equal to |Ci | times the j, cj distance in the metric. In this paper, we present an O(log n)-approximation for both these problems where n is the number of points in the metric that are to be served. This is an improvement over the O(−1 log1+ n)-approximation (for any constant > 0) obtained by Bartal, Charikar, and Raz [STOC ’01]. We also obtain a quasi-PTAS for Balanced k-Median in metrics with constant doubling dimension. As in the work of Bartal et al., our approximation for general metrics uses embeddings into tree metrics. The main technical contribution in this paper is an O(1)-approximation for Balanced k-Median in hierarchically separated trees (HSTs). Our improvement comes from a more direct dynamic programming approach that heavily exploits properties of standard HSTs. In this way, we avoid the reduction to special types of HSTs that were considered by Bartal et al., thereby avoiding an additional O(−1 log n) loss

CiteSeerX

Crossref

The Bane of Low-Dimensionality Clustering

Author: Cohen-Addad Vincent
de Mesmay Arnaud
Rotenberg Eva
Roytman Alan
Publication venue
Publication date: 03/11/2017
Field of study

In this paper, we give a conditional lower bound of

n^{\Omega(k)}

on running time for the classic k-median and k-means clustering objectives (where n is the size of the input), even in low-dimensional Euclidean space of dimension four, assuming the Exponential Time Hypothesis (ETH). We also consider k-median (and k-means) with penalties where each point need not be assigned to a center, in which case it must pay a penalty, and extend our lower bound to at least three-dimensional Euclidean space. This stands in stark contrast to many other geometric problems such as the traveling salesman problem, or computing an independent set of unit spheres. While these problems benefit from the so-called (limited) blessing of dimensionality, as they can be solved in time

n^{O(k^{1-1/d})}

2^{n^{1-1/d}}

in d dimensions, our work shows that widely-used clustering objectives have a lower bound of

n^{\Omega(k)}

, even in dimension four. We complete the picture by considering the two-dimensional case: we show that there is no algorithm that solves the penalized version in time less than

n^{o(\sqrt{k})}

, and provide a matching upper bound of

n^{O(\sqrt{k})}

. The main tool we use to establish these lower bounds is the placement of points on the moment curve, which takes its inspiration from constructions of point sets yielding Delaunay complexes of high complexity

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Copenhagen University Research Information System

Online Research Database In Technology

The Unreasonable Success of Local Search: Geometric Optimization

Author: Cohen-Addad Vincent
Mathieu Claire
Publication venue
Publication date: 09/09/2015
Field of study

What is the effectiveness of local search algorithms for geometric problems in the plane? We prove that local search with neighborhoods of magnitude

1/\epsilon^c

is an approximation scheme for the following problems in the Euclidian plane: TSP with random inputs, Steiner tree with random inputs, facility location (with worst case inputs), and bicriteria

k

-median (also with worst case inputs). The randomness assumption is necessary for TSP

arXiv.org e-Print Archive

CiteSeerX

On Variants of k-means Clustering

Author: Bandyapadhyay Sayan
Varadarajan Kasturi
Publication venue
Publication date: 09/12/2015
Field of study

\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems, specifically \textit{

k

-means} clustering has got much attention from the researchers. Despite the fact that

k

-means is a very well studied problem its status in the plane is still an open problem. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound in polynomial time is 9+\eps. In this paper, we consider the following variant of

k

-means. Given a set

C

of points in

\mathcal{R}^d

and a real

f > 0

, find a finite set

F

of points in

\mathcal{R}^d

that minimizes the quantity

f*|F|+\sum_{p\in C} \min_{q \in F} {||p-q||}^2

. For any fixed dimension

d

, we design a local search PTAS for this problem. We also give a "bi-criterion" local search algorithm for

k

-means which uses (1+\eps)k centers and yields a solution whose cost is at most (1+\eps) times the cost of an optimal

k

-means solution. The algorithm runs in polynomial time for any fixed dimension. The contribution of this paper is two fold. On the one hand, we are being able to handle the square of distances in an elegant manner, which yields near optimal approximation bound. This leads us towards a better understanding of the

k

-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that very little is known about the local search method for geometric approximation.Comment: 15 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server