652 research outputs found

    Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

    Get PDF
    We study the problem of k-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez\u27s algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center clustering with outliers efficiently, in terms of clustering quality and time complexity. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Our algorithms are easy to implement in practice. We test our method on both synthetic and real datasets. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower running times comparing with existing methods

    Small Space Stream Summary for Matroid Center

    Get PDF
    In the matroid center problem, which generalizes the k-center problem, we need to pick a set of centers that is an independent set of a matroid with rank r. We study this problem in streaming, where elements of the ground set arrive in the stream. We first show that any randomized one-pass streaming algorithm that computes a better than Delta-approximation for partition-matroid center must use Omega(r^2) bits of space, where Delta is the aspect ratio of the metric and can be arbitrarily large. This shows a quadratic separation between matroid center and k-center, for which the Doubling algorithm [Charikar et al., 1997] gives an 8-approximation using O(k)-space and one pass. To complement this, we give a one-pass algorithm for matroid center that stores at most O(r^2 log(1/epsilon)/epsilon) points (viz., stream summary) among which a (7+epsilon)-approximate solution exists, which can be found by brute force, or a (17+epsilon)-approximation can be found with an efficient algorithm. If we are allowed a second pass, we can compute a (3+epsilon)-approximation efficiently. We also consider the problem of matroid center with z outliers and give a one-pass algorithm that outputs a set of O((r^2+rz)log(1/epsilon)/epsilon) points that contains a (15+epsilon)-approximate solution. Our techniques extend to knapsack center and knapsack center with z outliers in a straightforward way, and we get algorithms that use space linear in the size of a largest feasible set (as opposed to quadratic space for matroid center)

    Robust hierarchical k-center clustering

    Get PDF
    One of the most popular and widely used methods for data clustering is hierarchical clustering. This clustering technique has proved useful to reveal interesting structure in the data in several applications ranging from computational biology to computer vision. Robustness is an important feature of a clustering technique if we require the clustering to be stable against small perturbations in the input data. In most applications, getting a clustering output that is robust against adversarial outliers or stochastic noise is a necessary condition for the applicability and effectiveness of the clustering technique. This is even more critical in hierarchical clustering where a small change at the bottom of the hierarchy may propagate all the way through to the top. Despite all the previous work [2, 3, 6, 8], our theoretical understanding of robust hierarchical clustering is still limited and several hierarchical clustering algorithms are not known to satisfy such robustness properties. In this paper, we study the limits of robust hierarchical k-center clustering by introducing the concept of universal hierarchical clustering and provide (almost) tight lower and upper bounds for the robust hierarchical k-center clustering problem with outliers and variants of the stochastic clustering problem. Most importantly we present a constant-factor approximation for optimal hierarchical k-center with at most z outliers using a universal set of at most O(z2) set of outliers and show that this result is tight. Moreover we show the necessity of using a universal set of outliers in order to compute an approximately optimal hierarchical k-center with a diffierent set of outliers for each k

    Matroid and Knapsack Center Problems

    Full text link
    In the classic kk-center problem, we are given a metric graph, and the objective is to open kk nodes as centers such that the maximum distance from any vertex to its closest center is minimized. In this paper, we consider two important generalizations of kk-center, the matroid center problem and the knapsack center problem. Both problems are motivated by recent content distribution network applications. Our contributions can be summarized as follows: 1. We consider the matroid center problem in which the centers are required to form an independent set of a given matroid. We show this problem is NP-hard even on a line. We present a 3-approximation algorithm for the problem on general metrics. We also consider the outlier version of the problem where a given number of vertices can be excluded as the outliers from the solution. We present a 7-approximation for the outlier version. 2. We consider the (multi-)knapsack center problem in which the centers are required to satisfy one (or more) knapsack constraint(s). It is known that the knapsack center problem with a single knapsack constraint admits a 3-approximation. However, when there are at least two knapsack constraints, we show this problem is not approximable at all. To complement the hardness result, we present a polynomial time algorithm that gives a 3-approximate solution such that one knapsack constraint is satisfied and the others may be violated by at most a factor of 1+ϵ1+\epsilon. We also obtain a 3-approximation for the outlier version that may violate the knapsack constraint by 1+ϵ1+\epsilon.Comment: A preliminary version of this paper is accepted to IPCO 201

    The Non-Uniform k-Center Problem

    Get PDF
    In this paper, we introduce and study the Non-Uniform k-Center problem (NUkC). Given a finite metric space (X,d)(X,d) and a collection of balls of radii {r1≥⋯≥rk}\{r_1\geq \cdots \ge r_k\}, the NUkC problem is to find a placement of their centers on the metric space and find the minimum dilation α\alpha, such that the union of balls of radius α⋅ri\alpha\cdot r_i around the iith center covers all the points in XX. This problem naturally arises as a min-max vehicle routing problem with fleets of different speeds. The NUkC problem generalizes the classic kk-center problem when all the kk radii are the same (which can be assumed to be 11 after scaling). It also generalizes the kk-center with outliers (kCwO) problem when there are kk balls of radius 11 and ℓ\ell balls of radius 00. There are 22-approximation and 33-approximation algorithms known for these problems respectively; the former is best possible unless P=NP and the latter remains unimproved for 15 years. We first observe that no O(1)O(1)-approximation is to the optimal dilation is possible unless P=NP, implying that the NUkC problem is more non-trivial than the above two problems. Our main algorithmic result is an (O(1),O(1))(O(1),O(1))-bi-criteria approximation result: we give an O(1)O(1)-approximation to the optimal dilation, however, we may open Θ(1)\Theta(1) centers of each radii. Our techniques also allow us to prove a simple (uni-criteria), optimal 22-approximation to the kCwO problem improving upon the long-standing 33-factor. Our main technical contribution is a connection between the NUkC problem and the so-called firefighter problems on trees which have been studied recently in the TCS community.Comment: Adjusted the figur

    k-Center Clustering with Outliers in the Sliding-Window Model

    Get PDF
    • …
    corecore