5 research outputs found

    Tight FPT Approximations for k-Median and k-Means

    Get PDF
    We investigate the fine-grained complexity of approximating the classical k-Median/k-Means clustering problems in general metric spaces. We show how to improve the approximation factors to (1+2/e+epsilon) and (1+8/e+epsilon) respectively, using algorithms that run in fixed-parameter time. Moreover, we show that we cannot do better in FPT time, modulo recent complexity-theoretic conjectures

    Structural Iterative Rounding for Generalized k-Median Problems

    Get PDF
    This paper considers approximation algorithms for generalized k-median problems. This class of problems can be informally described as k-median with a constant number of extra constraints, and includes k-median with outliers, and knapsack median. Our first contribution is a pseudo-approximation algorithm for generalized k-median that outputs a 6.387-approximate solution with a constant number of fractional variables. The algorithm is based on iteratively rounding linear programs, and the main technical innovation comes from understanding the rich structure of the resulting extreme points. Using our pseudo-approximation algorithm, we give improved approximation algorithms for k-median with outliers and knapsack median. This involves combining our pseudo-approximation with pre- and post-processing steps to round a constant number of fractional variables at a small increase in cost. Our algorithms achieve approximation ratios 6.994 + ? and 6.387 + ? for k-median with outliers and knapsack median, respectively. These both improve on the best known approximations

    Connected k-Center and k-Diameter Clustering

    Get PDF
    Motivated by an application from geodesy, we introduce a novel clustering problem which is a kk-center (or k-diameter) problem with a side constraint. For the side constraint, we are given an undirected connectivity graph GG on the input points, and a clustering is now only feasible if every cluster induces a connected subgraph in GG. We call the resulting problems the connected kk-center problem and the connected kk-diameter problem. We prove several results on the complexity and approximability of these problems. Our main result is an O(log2k)O(\log^2{k})-approximation algorithm for the connected kk-center and the connected kk-diameter problem. For Euclidean metrics and metrics with constant doubling dimension, the approximation factor of this algorithm improves to O(1)O(1). We also consider the special cases that the connectivity graph is a line or a tree. For the line we give optimal polynomial-time algorithms and for the case that the connectivity graph is a tree, we either give an optimal polynomial-time algorithm or a 22-approximation algorithm for all variants of our model. We complement our upper bounds by several lower bounds

    Coresets for Clustering with General Assignment Constraints

    Full text link
    Designing small-sized \emph{coresets}, which approximately preserve the costs of the solutions for large datasets, has been an important research direction for the past decade. We consider coreset construction for a variety of general constrained clustering problems. We significantly extend and generalize the results of a very recent paper (Braverman et al., FOCS'22), by demonstrating that the idea of hierarchical uniform sampling (Chen, SICOMP'09; Braverman et al., FOCS'22) can be applied to efficiently construct coresets for a very general class of constrained clustering problems with general assignment constraints, including capacity constraints on cluster centers, and assignment structure constraints for data points (modeled by a convex body B)\mathcal{B}). Our main theorem shows that a small-sized ϵ\epsilon-coreset exists as long as a complexity measure Lip(B)\mathsf{Lip}(\mathcal{B}) of the structure constraint, and the \emph{covering exponent} Λϵ(X)\Lambda_\epsilon(\mathcal{X}) for metric space (X,d)(\mathcal{X},d) are bounded. The complexity measure Lip(B)\mathsf{Lip}(\mathcal{B}) for convex body B\mathcal{B} is the Lipschitz constant of a certain transportation problem constrained in B\mathcal{B}, called \emph{optimal assignment transportation problem}. We prove nontrivial upper bounds of Lip(B)\mathsf{Lip}(\mathcal{B}) for various polytopes, including the general matroid basis polytopes, and laminar matroid polytopes (with better bound). As an application of our general theorem, we construct the first coreset for the fault-tolerant clustering problem (with or without capacity upper/lower bound) for the above metric spaces, in which the fault-tolerance requirement is captured by a uniform matroid basis polytope