61 research outputs found
On the Fixed-Parameter Tractability of Capacitated Clustering
We study the complexity of the classic capacitated k-median and k-means problems parameterized by the number of centers, k. These problems are notoriously difficult since the best known approximation bound for high dimensional Euclidean space and general metric space is Theta(log k) and it remains a major open problem whether a constant factor exists.
We show that there exists a (3+epsilon)-approximation algorithm for the capacitated k-median and a (9+epsilon)-approximation algorithm for the capacitated k-means problem in general metric spaces whose running times are f(epsilon,k) n^{O(1)}. For Euclidean inputs of arbitrary dimension, we give a (1+epsilon)-approximation algorithm for both problems with a similar running time. This is a significant improvement over the (7+epsilon)-approximation of Adamczyk et al. for k-median in general metric spaces and the (69+epsilon)-approximation of Xu et al. for Euclidean k-means
Improved Bounds for Metric Capacitated Covering Problems
In the Metric Capacitated Covering (MCC) problem, given a set of balls ? in a metric space P with metric d and a capacity parameter U, the goal is to find a minimum sized subset ?\u27 ? ? and an assignment of the points in P to the balls in ?\u27 such that each point is assigned to a ball that contains it and each ball is assigned with at most U points. MCC achieves an O(log |P|)-approximation using a greedy algorithm. On the other hand, it is hard to approximate within a factor of o(log |P|) even with ? < 3 factor expansion of the balls. Bandyapadhyay et al. [SoCG 2018, DCG 2019] showed that one can obtain an O(1)-approximation for the problem with 6.47 factor expansion of the balls. An open question left by their work is to reduce the gap between the lower bound 3 and the upper bound 6.47. In this current work, we show that it is possible to obtain an O(1)-approximation with only 4.24 factor expansion of the balls. We also show a similar upper bound of 5 for a more generalized version of MCC for which the best previously known bound was 9
FPT Approximations for Capacitated/Fair Clustering with Outliers
Clustering problems such as -Median, and -Means, are motivated from
applications such as location planning, unsupervised learning among others. In
such applications, it is important to find the clustering of points that is not
``skewed'' in terms of the number of points, i.e., no cluster should contain
too many points. This is modeled by capacity constraints on the sizes of
clusters. In an orthogonal direction, another important consideration in
clustering is how to handle the presence of outliers in the data. Indeed, these
clustering problems have been generalized in the literature to separately
handle capacity constraints and outliers. To the best of our knowledge, there
has been very little work on studying the approximability of clustering
problems that can simultaneously handle both capacities and outliers.
We initiate the study of the Capacitated -Median with Outliers (CMO)
problem. Here, we want to cluster all except outlier points into at most
clusters, such that (i) the clusters respect the capacity constraints, and
(ii) the cost of clustering, defined as the sum of distances of each
non-outlier point to its assigned cluster-center, is minimized.
We design the first constant-factor approximation algorithms for CMO. In
particular, our algorithm returns a (3+\epsilon)-approximation for CMO in
general metric spaces, and a (1+\epsilon)-approximation in Euclidean spaces of
constant dimension, that runs in time in time , where denotes the input size. We can also extend these
results to a broader class of problems, including Capacitated
k-Means/k-Facility Location with Outliers, and Size-Balanced Fair Clustering
problems with Outliers. For each of these problems, we obtain an approximation
ratio that matches the best known guarantee of the corresponding outlier-free
problem.Comment: Abstract shortened to meet arxiv requirement
The Capacitated Matroid Median Problem
In this thesis, we study the capacitated generalization of the Matroid Median Problem which is a generalization of the classical clustering problem called the k-Median problem. In the capacitated matroid median problem, we are given a set F of facilities, a set D of clients and a common metric defined on F ∪ D, where the cost of connecting client j to facility i is denoted as c_{ij}. Each client j ∈ D has a demand of d_j, and each facility i ∈ F has an opening cost of f_i and a capacity u_i which limits the amount of demand that can be assigned to facility i. Moreover, there is a matroid M = (F,I) defined on the set of facilities. A solution to the capacitated matroid median problem involves opening a set of facilities F' ⊆ F such that F' ∈ I, and figuring out an assignment i(j) ∈ F' for every j ∈ D such that each facility i ∈ F' is assigned at most u_i demand. The cost associated with such a solution is : Σ_{i∈F} f_i + Σ_{j∈D} d_j c_{i(j)j}. Our goal is to find a solution of minimum cost.
As the Matroid Median Problem generalizes the classical NP-Hard problem called k- median, it also is NP-Hard. We provide a bi-criteria approximation algorithm for the capacitated Matroid Median Problem with uniform capacities based on rounding the natural LP for the problem. Our algorithm achieves an approximation guarantee of 76 and violates the capacities by a factor of at most 6. We complement this result by providing two integrality gap results for the natural LP for capacitated matroid median
FPT Constant-Approximations for Capacitated Clustering to Minimize the Sum of Cluster Radii
Clustering with capacity constraints is a fundamental problem that attracted
significant attention throughout the years. In this paper, we give the first
FPT constant-factor approximation algorithm for the problem of clustering
points in a general metric into clusters to minimize the sum of cluster
radii, subject to non-uniform hard capacity constraints. In particular, we give
a -approximation algorithm that runs in time. When capacities are uniform, we obtain the following improved
approximation bounds: A (4 + )-approximation with running time
, which significantly improves over the FPT
28-approximation of Inamdar and Varadarajan [ESA 2020]; a (2 +
)-approximation with running time and a -approximation with running
time in the Euclidean space; and a (1 +
)-approximation in the Euclidean space with running time
if we are allowed to violate
the capacities by (1 + )-factor. We complement this result by showing
that there is no (1 + )-approximation algorithm running in time
, if any capacity violation is not allowed.Comment: Full version of a paper accepted to SoCG 202
Coresets for Clustering with General Assignment Constraints
Designing small-sized \emph{coresets}, which approximately preserve the costs
of the solutions for large datasets, has been an important research direction
for the past decade. We consider coreset construction for a variety of general
constrained clustering problems. We significantly extend and generalize the
results of a very recent paper (Braverman et al., FOCS'22), by demonstrating
that the idea of hierarchical uniform sampling (Chen, SICOMP'09; Braverman et
al., FOCS'22) can be applied to efficiently construct coresets for a very
general class of constrained clustering problems with general assignment
constraints, including capacity constraints on cluster centers, and assignment
structure constraints for data points (modeled by a convex body .
Our main theorem shows that a small-sized -coreset exists as long
as a complexity measure of the structure
constraint, and the \emph{covering exponent}
for metric space are bounded. The complexity measure
for convex body is the Lipschitz
constant of a certain transportation problem constrained in ,
called \emph{optimal assignment transportation problem}. We prove nontrivial
upper bounds of for various polytopes, including
the general matroid basis polytopes, and laminar matroid polytopes (with better
bound). As an application of our general theorem, we construct the first
coreset for the fault-tolerant clustering problem (with or without capacity
upper/lower bound) for the above metric spaces, in which the fault-tolerance
requirement is captured by a uniform matroid basis polytope
- …