Search CORE

25,078 research outputs found

Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

Author: Ding Hu
Wang Zixiu
Yu Haikuo
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

We study the problem of k-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez\u27s algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center clustering with outliers efficiently, in terms of clustering quality and time complexity. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Our algorithms are easy to implement in practice. We test our method on both synthetic and real datasets. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower running times comparing with existing methods

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition

Author: Alstrup S.
Aluru S.
Bern M. W.
David Eppstein
Erickson J.
Saitou N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2008
Field of study

We provide efficient constant factor approximation algorithms for the problems of finding a hierarchical clustering of a point set in any metric space, minimizing the sum of minimimum spanning tree lengths within each cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can also be used to provide a pants decomposition, that is, a set of disjoint simple closed curves partitioning the plane minus the input points into subsets with exactly three boundary components, with approximately minimum total length. In the Euclidean case, these curves are squares; in the hyperbolic case, they combine our Euclidean square pants decomposition with our tree clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now Lemma 5.2, as the previous proof was erroneou

arXiv.org e-Print Archive

Crossref

Clustering with diversity

Author: Li Jian
Yi Ke
Zhang Qin
Publication venue
Publication date: 01/01/2010
Field of study

We consider the {\em clustering with diversity} problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least

\ell

points, all of which have distinct colors. We give a 2-approximation to this problem for any

\ell

when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless

\mathbf{P=NP}

, by providing a matching lower bound. Several extensions to our algorithm have also been developed for handling outliers. This problem is mainly motivated by applications in privacy-preserving data publication.Comment: Extended abstract accepted in ICALP 2010. Keywords: Approximation algorithm, k-center, k-anonymity, l-diversit

arXiv.org e-Print Archive

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Next Generation Cluster Editing

Author: Bellitto Thomas
Klau Gunnar W.
Marschall Tobias
Schönhuth Alexander
Publication venue
Publication date: 01/01/2013
Field of study

This work aims at improving the quality of structural variant prediction from the mapped reads of a sequenced genome. We suggest a new model based on cluster editing in weighted graphs and introduce a new heuristic algorithm that allows to solve this problem quickly and with a good approximation on the huge graphs that arise from biological datasets

arXiv.org e-Print Archive

CWI's Institutional Repository