5 research outputs found
Tight FPT Approximations for k-Median and k-Means
We investigate the fine-grained complexity of approximating the classical k-Median/k-Means clustering problems in general metric spaces. We show how to improve the approximation factors to (1+2/e+epsilon) and (1+8/e+epsilon) respectively, using algorithms that run in fixed-parameter time. Moreover, we show that we cannot do better in FPT time, modulo recent complexity-theoretic conjectures
Structural Iterative Rounding for Generalized k-Median Problems
This paper considers approximation algorithms for generalized k-median problems. This class of problems can be informally described as k-median with a constant number of extra constraints, and includes k-median with outliers, and knapsack median. Our first contribution is a pseudo-approximation algorithm for generalized k-median that outputs a 6.387-approximate solution with a constant number of fractional variables. The algorithm is based on iteratively rounding linear programs, and the main technical innovation comes from understanding the rich structure of the resulting extreme points.
Using our pseudo-approximation algorithm, we give improved approximation algorithms for k-median with outliers and knapsack median. This involves combining our pseudo-approximation with pre- and post-processing steps to round a constant number of fractional variables at a small increase in cost. Our algorithms achieve approximation ratios 6.994 + ? and 6.387 + ? for k-median with outliers and knapsack median, respectively. These both improve on the best known approximations
Connected k-Center and k-Diameter Clustering
Motivated by an application from geodesy, we introduce a novel clustering
problem which is a -center (or k-diameter) problem with a side constraint.
For the side constraint, we are given an undirected connectivity graph on
the input points, and a clustering is now only feasible if every cluster
induces a connected subgraph in . We call the resulting problems the
connected -center problem and the connected -diameter problem.
We prove several results on the complexity and approximability of these
problems. Our main result is an -approximation algorithm for the
connected -center and the connected -diameter problem. For Euclidean
metrics and metrics with constant doubling dimension, the approximation factor
of this algorithm improves to . We also consider the special cases that
the connectivity graph is a line or a tree. For the line we give optimal
polynomial-time algorithms and for the case that the connectivity graph is a
tree, we either give an optimal polynomial-time algorithm or a
-approximation algorithm for all variants of our model. We complement our
upper bounds by several lower bounds
Coresets for Clustering with General Assignment Constraints
Designing small-sized \emph{coresets}, which approximately preserve the costs
of the solutions for large datasets, has been an important research direction
for the past decade. We consider coreset construction for a variety of general
constrained clustering problems. We significantly extend and generalize the
results of a very recent paper (Braverman et al., FOCS'22), by demonstrating
that the idea of hierarchical uniform sampling (Chen, SICOMP'09; Braverman et
al., FOCS'22) can be applied to efficiently construct coresets for a very
general class of constrained clustering problems with general assignment
constraints, including capacity constraints on cluster centers, and assignment
structure constraints for data points (modeled by a convex body .
Our main theorem shows that a small-sized -coreset exists as long
as a complexity measure of the structure
constraint, and the \emph{covering exponent}
for metric space are bounded. The complexity measure
for convex body is the Lipschitz
constant of a certain transportation problem constrained in ,
called \emph{optimal assignment transportation problem}. We prove nontrivial
upper bounds of for various polytopes, including
the general matroid basis polytopes, and laminar matroid polytopes (with better
bound). As an application of our general theorem, we construct the first
coreset for the fault-tolerant clustering problem (with or without capacity
upper/lower bound) for the above metric spaces, in which the fault-tolerance
requirement is captured by a uniform matroid basis polytope