61 research outputs found

    On the Fixed-Parameter Tractability of Capacitated Clustering

    Get PDF
    We study the complexity of the classic capacitated k-median and k-means problems parameterized by the number of centers, k. These problems are notoriously difficult since the best known approximation bound for high dimensional Euclidean space and general metric space is Theta(log k) and it remains a major open problem whether a constant factor exists. We show that there exists a (3+epsilon)-approximation algorithm for the capacitated k-median and a (9+epsilon)-approximation algorithm for the capacitated k-means problem in general metric spaces whose running times are f(epsilon,k) n^{O(1)}. For Euclidean inputs of arbitrary dimension, we give a (1+epsilon)-approximation algorithm for both problems with a similar running time. This is a significant improvement over the (7+epsilon)-approximation of Adamczyk et al. for k-median in general metric spaces and the (69+epsilon)-approximation of Xu et al. for Euclidean k-means

    Improved Bounds for Metric Capacitated Covering Problems

    Get PDF
    In the Metric Capacitated Covering (MCC) problem, given a set of balls ? in a metric space P with metric d and a capacity parameter U, the goal is to find a minimum sized subset ?\u27 ? ? and an assignment of the points in P to the balls in ?\u27 such that each point is assigned to a ball that contains it and each ball is assigned with at most U points. MCC achieves an O(log |P|)-approximation using a greedy algorithm. On the other hand, it is hard to approximate within a factor of o(log |P|) even with ? < 3 factor expansion of the balls. Bandyapadhyay et al. [SoCG 2018, DCG 2019] showed that one can obtain an O(1)-approximation for the problem with 6.47 factor expansion of the balls. An open question left by their work is to reduce the gap between the lower bound 3 and the upper bound 6.47. In this current work, we show that it is possible to obtain an O(1)-approximation with only 4.24 factor expansion of the balls. We also show a similar upper bound of 5 for a more generalized version of MCC for which the best previously known bound was 9

    FPT Approximations for Capacitated/Fair Clustering with Outliers

    Full text link
    Clustering problems such as kk-Median, and kk-Means, are motivated from applications such as location planning, unsupervised learning among others. In such applications, it is important to find the clustering of points that is not ``skewed'' in terms of the number of points, i.e., no cluster should contain too many points. This is modeled by capacity constraints on the sizes of clusters. In an orthogonal direction, another important consideration in clustering is how to handle the presence of outliers in the data. Indeed, these clustering problems have been generalized in the literature to separately handle capacity constraints and outliers. To the best of our knowledge, there has been very little work on studying the approximability of clustering problems that can simultaneously handle both capacities and outliers. We initiate the study of the Capacitated kk-Median with Outliers (CkkMO) problem. Here, we want to cluster all except mm outlier points into at most kk clusters, such that (i) the clusters respect the capacity constraints, and (ii) the cost of clustering, defined as the sum of distances of each non-outlier point to its assigned cluster-center, is minimized. We design the first constant-factor approximation algorithms for CkkMO. In particular, our algorithm returns a (3+\epsilon)-approximation for CkkMO in general metric spaces, and a (1+\epsilon)-approximation in Euclidean spaces of constant dimension, that runs in time in time f(k,m,ϵ)ImO(1)f(k, m, \epsilon) \cdot |I_m|^{O(1)}, where Im|I_m| denotes the input size. We can also extend these results to a broader class of problems, including Capacitated k-Means/k-Facility Location with Outliers, and Size-Balanced Fair Clustering problems with Outliers. For each of these problems, we obtain an approximation ratio that matches the best known guarantee of the corresponding outlier-free problem.Comment: Abstract shortened to meet arxiv requirement

    The Capacitated Matroid Median Problem

    Get PDF
    In this thesis, we study the capacitated generalization of the Matroid Median Problem which is a generalization of the classical clustering problem called the k-Median problem. In the capacitated matroid median problem, we are given a set F of facilities, a set D of clients and a common metric defined on F ∪ D, where the cost of connecting client j to facility i is denoted as c_{ij}. Each client j ∈ D has a demand of d_j, and each facility i ∈ F has an opening cost of f_i and a capacity u_i which limits the amount of demand that can be assigned to facility i. Moreover, there is a matroid M = (F,I) defined on the set of facilities. A solution to the capacitated matroid median problem involves opening a set of facilities F' ⊆ F such that F' ∈ I, and figuring out an assignment i(j) ∈ F' for every j ∈ D such that each facility i ∈ F' is assigned at most u_i demand. The cost associated with such a solution is : Σ_{i∈F} f_i + Σ_{j∈D} d_j c_{i(j)j}. Our goal is to find a solution of minimum cost. As the Matroid Median Problem generalizes the classical NP-Hard problem called k- median, it also is NP-Hard. We provide a bi-criteria approximation algorithm for the capacitated Matroid Median Problem with uniform capacities based on rounding the natural LP for the problem. Our algorithm achieves an approximation guarantee of 76 and violates the capacities by a factor of at most 6. We complement this result by providing two integrality gap results for the natural LP for capacitated matroid median

    FPT Constant-Approximations for Capacitated Clustering to Minimize the Sum of Cluster Radii

    Get PDF
    Clustering with capacity constraints is a fundamental problem that attracted significant attention throughout the years. In this paper, we give the first FPT constant-factor approximation algorithm for the problem of clustering points in a general metric into kk clusters to minimize the sum of cluster radii, subject to non-uniform hard capacity constraints. In particular, we give a (15+ϵ)(15+\epsilon)-approximation algorithm that runs in 20(k2logk)n32^{0(k^2\log k)}\cdot n^3 time. When capacities are uniform, we obtain the following improved approximation bounds: A (4 + ϵ\epsilon)-approximation with running time 2O(klog(k/ϵ))n32^{O(k\log(k/\epsilon))}n^3, which significantly improves over the FPT 28-approximation of Inamdar and Varadarajan [ESA 2020]; a (2 + ϵ\epsilon)-approximation with running time 2O(k/ϵ2log(k/ϵ))dn32^{O(k/\epsilon^2 \cdot\log(k/\epsilon))}dn^3 and a (1+ϵ)(1+\epsilon)-approximation with running time 2O(kdlog((k/ϵ)))n32^{O(kd\log ((k/\epsilon)))}n^{3} in the Euclidean space; and a (1 + ϵ\epsilon)-approximation in the Euclidean space with running time 2O(k/ϵ2log(k/ϵ))dn32^{O(k/\epsilon^2 \cdot\log(k/\epsilon))}dn^3 if we are allowed to violate the capacities by (1 + ϵ\epsilon)-factor. We complement this result by showing that there is no (1 + ϵ\epsilon)-approximation algorithm running in time f(k)nO(1)f(k)\cdot n^{O(1)}, if any capacity violation is not allowed.Comment: Full version of a paper accepted to SoCG 202

    Capacitated Sum-Of-Radii Clustering: An FPT Approximation

    Get PDF

    Coresets for Clustering with General Assignment Constraints

    Full text link
    Designing small-sized \emph{coresets}, which approximately preserve the costs of the solutions for large datasets, has been an important research direction for the past decade. We consider coreset construction for a variety of general constrained clustering problems. We significantly extend and generalize the results of a very recent paper (Braverman et al., FOCS'22), by demonstrating that the idea of hierarchical uniform sampling (Chen, SICOMP'09; Braverman et al., FOCS'22) can be applied to efficiently construct coresets for a very general class of constrained clustering problems with general assignment constraints, including capacity constraints on cluster centers, and assignment structure constraints for data points (modeled by a convex body B)\mathcal{B}). Our main theorem shows that a small-sized ϵ\epsilon-coreset exists as long as a complexity measure Lip(B)\mathsf{Lip}(\mathcal{B}) of the structure constraint, and the \emph{covering exponent} Λϵ(X)\Lambda_\epsilon(\mathcal{X}) for metric space (X,d)(\mathcal{X},d) are bounded. The complexity measure Lip(B)\mathsf{Lip}(\mathcal{B}) for convex body B\mathcal{B} is the Lipschitz constant of a certain transportation problem constrained in B\mathcal{B}, called \emph{optimal assignment transportation problem}. We prove nontrivial upper bounds of Lip(B)\mathsf{Lip}(\mathcal{B}) for various polytopes, including the general matroid basis polytopes, and laminar matroid polytopes (with better bound). As an application of our general theorem, we construct the first coreset for the fault-tolerant clustering problem (with or without capacity upper/lower bound) for the above metric spaces, in which the fault-tolerance requirement is captured by a uniform matroid basis polytope
    corecore