24,486 research outputs found

    Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

    Get PDF
    We study the problem of k-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez\u27s algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center clustering with outliers efficiently, in terms of clustering quality and time complexity. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Our algorithms are easy to implement in practice. We test our method on both synthetic and real datasets. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower running times comparing with existing methods

    Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition

    Full text link
    We provide efficient constant factor approximation algorithms for the problems of finding a hierarchical clustering of a point set in any metric space, minimizing the sum of minimimum spanning tree lengths within each cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can also be used to provide a pants decomposition, that is, a set of disjoint simple closed curves partitioning the plane minus the input points into subsets with exactly three boundary components, with approximately minimum total length. In the Euclidean case, these curves are squares; in the hyperbolic case, they combine our Euclidean square pants decomposition with our tree clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now Lemma 5.2, as the previous proof was erroneou

    Next Generation Cluster Editing

    Get PDF
    This work aims at improving the quality of structural variant prediction from the mapped reads of a sequenced genome. We suggest a new model based on cluster editing in weighted graphs and introduce a new heuristic algorithm that allows to solve this problem quickly and with a good approximation on the huge graphs that arise from biological datasets

    The Non-Uniform k-Center Problem

    Get PDF
    In this paper, we introduce and study the Non-Uniform k-Center problem (NUkC). Given a finite metric space (X,d)(X,d) and a collection of balls of radii {r1≥⋯≥rk}\{r_1\geq \cdots \ge r_k\}, the NUkC problem is to find a placement of their centers on the metric space and find the minimum dilation α\alpha, such that the union of balls of radius α⋅ri\alpha\cdot r_i around the iith center covers all the points in XX. This problem naturally arises as a min-max vehicle routing problem with fleets of different speeds. The NUkC problem generalizes the classic kk-center problem when all the kk radii are the same (which can be assumed to be 11 after scaling). It also generalizes the kk-center with outliers (kCwO) problem when there are kk balls of radius 11 and ℓ\ell balls of radius 00. There are 22-approximation and 33-approximation algorithms known for these problems respectively; the former is best possible unless P=NP and the latter remains unimproved for 15 years. We first observe that no O(1)O(1)-approximation is to the optimal dilation is possible unless P=NP, implying that the NUkC problem is more non-trivial than the above two problems. Our main algorithmic result is an (O(1),O(1))(O(1),O(1))-bi-criteria approximation result: we give an O(1)O(1)-approximation to the optimal dilation, however, we may open Θ(1)\Theta(1) centers of each radii. Our techniques also allow us to prove a simple (uni-criteria), optimal 22-approximation to the kCwO problem improving upon the long-standing 33-factor. Our main technical contribution is a connection between the NUkC problem and the so-called firefighter problems on trees which have been studied recently in the TCS community.Comment: Adjusted the figur
    • …
    corecore