4 research outputs found
Coresets for Clustering with General Assignment Constraints
Designing small-sized \emph{coresets}, which approximately preserve the costs
of the solutions for large datasets, has been an important research direction
for the past decade. We consider coreset construction for a variety of general
constrained clustering problems. We significantly extend and generalize the
results of a very recent paper (Braverman et al., FOCS'22), by demonstrating
that the idea of hierarchical uniform sampling (Chen, SICOMP'09; Braverman et
al., FOCS'22) can be applied to efficiently construct coresets for a very
general class of constrained clustering problems with general assignment
constraints, including capacity constraints on cluster centers, and assignment
structure constraints for data points (modeled by a convex body .
Our main theorem shows that a small-sized -coreset exists as long
as a complexity measure of the structure
constraint, and the \emph{covering exponent}
for metric space are bounded. The complexity measure
for convex body is the Lipschitz
constant of a certain transportation problem constrained in ,
called \emph{optimal assignment transportation problem}. We prove nontrivial
upper bounds of for various polytopes, including
the general matroid basis polytopes, and laminar matroid polytopes (with better
bound). As an application of our general theorem, we construct the first
coreset for the fault-tolerant clustering problem (with or without capacity
upper/lower bound) for the above metric spaces, in which the fault-tolerance
requirement is captured by a uniform matroid basis polytope
Near-linear time approximation schemes for clustering in doubling metrics
We consider the classic Facility Location, k-Median, and k-Means problems in metric spaces of doubling dimension d. We give nearly linear-time approximation schemes for each problem. The complexity of our algorithms is Õ(2(1/ε)O(d2) n), making a significant improvement over the state-of-the-art algorithms that run in time n(d/ε)O(d).
Moreover, we show how to extend the techniques used to get the first efficient approximation schemes for the problems of prize-collecting k-Median and k-Means and efficient bicriteria approximation schemes for k-Median with outliers, k-Means with outliers and k-Center
-Coresets for Clustering (with Outliers) in Doubling Metrics
We study the problem of constructing -coresets for the -clustering problem in a doubling metric . An -coreset
is a weighted subset with weight function , such that for any -subset , it holds that
.
We present an efficient algorithm that constructs an -coreset
for the -clustering problem in , where the size of the coreset
only depends on the parameters and the doubling dimension
. To the best of our knowledge, this is the first efficient
-coreset construction of size independent of for general
clustering problems in doubling metrics.
To this end, we establish the first relation between the doubling dimension
of and the shattering dimension (or VC-dimension) of the range space
induced by the distance . Such a relation was not known before, since one
can easily construct instances in which neither one can be bounded by (some
function of) the other. Surprisingly, we show that if we allow a small
-distortion of the distance function , and consider the
notion of -error probabilistic shattering dimension, we can prove an
upper bound of for the probabilistic shattering dimension for
even weighted doubling metrics. We believe this new relation is of independent
interest and may find other applications.
We also study the robust coresets and centroid sets in doubling metrics. Our
robust coreset construction leads to new results in clustering and property
testing, and the centroid sets can be used to accelerate the local search
algorithms for clustering problems.Comment: Appeared in FOCS 2018, this is the full versio