652 research outputs found
Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction
We study the problem of k-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez\u27s algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center clustering with outliers efficiently, in terms of clustering quality and time complexity. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Our algorithms are easy to implement in practice. We test our method on both synthetic and real datasets. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower running times comparing with existing methods
Small Space Stream Summary for Matroid Center
In the matroid center problem, which generalizes the k-center problem, we need to pick a set of centers that is an independent set of a matroid with rank r. We study this problem in streaming, where elements of the ground set arrive in the stream. We first show that any randomized one-pass streaming algorithm that computes a better than Delta-approximation for partition-matroid center must use Omega(r^2) bits of space, where Delta is the aspect ratio of the metric and can be arbitrarily large. This shows a quadratic separation between matroid center and k-center, for which the Doubling algorithm [Charikar et al., 1997] gives an 8-approximation using O(k)-space and one pass. To complement this, we give a one-pass algorithm for matroid center that stores at most O(r^2 log(1/epsilon)/epsilon) points (viz., stream summary) among which a (7+epsilon)-approximate solution exists, which can be found by brute force, or a (17+epsilon)-approximation can be found with an efficient algorithm. If we are allowed a second pass, we can compute a (3+epsilon)-approximation efficiently.
We also consider the problem of matroid center with z outliers and give a one-pass algorithm that outputs a set of O((r^2+rz)log(1/epsilon)/epsilon) points that contains a (15+epsilon)-approximate solution. Our techniques extend to knapsack center and knapsack center with z outliers in a straightforward way, and we get algorithms that use space linear in the size of a largest feasible set (as opposed to quadratic space for matroid center)
Robust hierarchical k-center clustering
One of the most popular and widely used methods for data clustering is hierarchical clustering. This clustering technique has proved useful to reveal interesting structure in the data in several applications ranging from computational biology to computer vision. Robustness is an important feature of a clustering technique if we require the clustering to be stable against small perturbations in the input data. In most applications, getting a clustering output that is robust against adversarial outliers or stochastic noise is a necessary condition for the applicability and effectiveness of the clustering technique. This is even more critical in hierarchical clustering where a small change at the bottom of the hierarchy may propagate all the way through to the top. Despite all the previous work [2, 3, 6, 8], our theoretical understanding of robust hierarchical clustering is still limited and several hierarchical clustering algorithms are not known to satisfy such robustness properties. In this paper, we study the limits of robust hierarchical k-center clustering by introducing the concept of universal hierarchical clustering and provide (almost) tight lower and upper bounds for the robust hierarchical k-center clustering problem with outliers and variants of the stochastic clustering problem. Most importantly we present a constant-factor approximation for optimal hierarchical k-center with at most z outliers using a universal set of at most O(z2) set of outliers and show that this result is tight. Moreover we show the necessity of using a universal set of outliers in order to compute an approximately optimal hierarchical k-center with a diffierent set of outliers for each k
Matroid and Knapsack Center Problems
In the classic -center problem, we are given a metric graph, and the
objective is to open nodes as centers such that the maximum distance from
any vertex to its closest center is minimized. In this paper, we consider two
important generalizations of -center, the matroid center problem and the
knapsack center problem. Both problems are motivated by recent content
distribution network applications. Our contributions can be summarized as
follows:
1. We consider the matroid center problem in which the centers are required
to form an independent set of a given matroid. We show this problem is NP-hard
even on a line. We present a 3-approximation algorithm for the problem on
general metrics. We also consider the outlier version of the problem where a
given number of vertices can be excluded as the outliers from the solution. We
present a 7-approximation for the outlier version.
2. We consider the (multi-)knapsack center problem in which the centers are
required to satisfy one (or more) knapsack constraint(s). It is known that the
knapsack center problem with a single knapsack constraint admits a
3-approximation. However, when there are at least two knapsack constraints, we
show this problem is not approximable at all. To complement the hardness
result, we present a polynomial time algorithm that gives a 3-approximate
solution such that one knapsack constraint is satisfied and the others may be
violated by at most a factor of . We also obtain a 3-approximation
for the outlier version that may violate the knapsack constraint by
.Comment: A preliminary version of this paper is accepted to IPCO 201
The Non-Uniform k-Center Problem
In this paper, we introduce and study the Non-Uniform k-Center problem
(NUkC). Given a finite metric space and a collection of balls of radii
, the NUkC problem is to find a placement of their
centers on the metric space and find the minimum dilation , such that
the union of balls of radius around the th center covers
all the points in . This problem naturally arises as a min-max vehicle
routing problem with fleets of different speeds.
The NUkC problem generalizes the classic -center problem when all the
radii are the same (which can be assumed to be after scaling). It also
generalizes the -center with outliers (kCwO) problem when there are
balls of radius and balls of radius . There are -approximation
and -approximation algorithms known for these problems respectively; the
former is best possible unless P=NP and the latter remains unimproved for 15
years.
We first observe that no -approximation is to the optimal dilation is
possible unless P=NP, implying that the NUkC problem is more non-trivial than
the above two problems. Our main algorithmic result is an
-bi-criteria approximation result: we give an -approximation
to the optimal dilation, however, we may open centers of each
radii. Our techniques also allow us to prove a simple (uni-criteria), optimal
-approximation to the kCwO problem improving upon the long-standing
-factor. Our main technical contribution is a connection between the NUkC
problem and the so-called firefighter problems on trees which have been studied
recently in the TCS community.Comment: Adjusted the figur
- …