83 research outputs found
Constant-Factor FPT Approximation for Capacitated k-Median
Capacitated k-median is one of the few outstanding optimization problems for which the existence of a polynomial time constant factor approximation algorithm remains an open problem. In a series of recent papers algorithms producing solutions violating either the number of facilities or the capacity by a multiplicative factor were obtained. However, to produce solutions without violations appears to be hard and potentially requires different algorithmic techniques. Notably, if parameterized by the number of facilities k, the problem is also W[2] hard, making the existence of an exact FPT algorithm unlikely. In this work we provide an FPT-time constant factor approximation algorithm preserving both cardinality and capacity of the facilities. The algorithm runs in time 2^O(k log k) n^O(1) and achieves an approximation ratio of 7+epsilon
FPT Approximations for Capacitated/Fair Clustering with Outliers
Clustering problems such as -Median, and -Means, are motivated from
applications such as location planning, unsupervised learning among others. In
such applications, it is important to find the clustering of points that is not
``skewed'' in terms of the number of points, i.e., no cluster should contain
too many points. This is modeled by capacity constraints on the sizes of
clusters. In an orthogonal direction, another important consideration in
clustering is how to handle the presence of outliers in the data. Indeed, these
clustering problems have been generalized in the literature to separately
handle capacity constraints and outliers. To the best of our knowledge, there
has been very little work on studying the approximability of clustering
problems that can simultaneously handle both capacities and outliers.
We initiate the study of the Capacitated -Median with Outliers (CMO)
problem. Here, we want to cluster all except outlier points into at most
clusters, such that (i) the clusters respect the capacity constraints, and
(ii) the cost of clustering, defined as the sum of distances of each
non-outlier point to its assigned cluster-center, is minimized.
We design the first constant-factor approximation algorithms for CMO. In
particular, our algorithm returns a (3+\epsilon)-approximation for CMO in
general metric spaces, and a (1+\epsilon)-approximation in Euclidean spaces of
constant dimension, that runs in time in time , where denotes the input size. We can also extend these
results to a broader class of problems, including Capacitated
k-Means/k-Facility Location with Outliers, and Size-Balanced Fair Clustering
problems with Outliers. For each of these problems, we obtain an approximation
ratio that matches the best known guarantee of the corresponding outlier-free
problem.Comment: Abstract shortened to meet arxiv requirement
A Unified Framework of FPT Approximation Algorithms for Clustering Problems
In this paper, we present a framework for designing FPT approximation algorithms for many k-clustering problems. Our results are based on a new technique for reducing search spaces. A reduced search space is a small subset of the input data that has the guarantee of containing k clients close to the facilities opened in an optimal solution for any clustering problem we consider. We show, somewhat surprisingly, that greedily sampling O(k) clients yields the desired reduced search space, based on which we obtain FPT(k)-time algorithms with improved approximation guarantees for problems such as capacitated clustering, lower-bounded clustering, clustering with service installation costs, fault tolerant clustering, and priority clustering
On the Fixed-Parameter Tractability of Capacitated Clustering
We study the complexity of the classic capacitated k-median and k-means problems parameterized by the number of centers, k. These problems are notoriously difficult since the best known approximation bound for high dimensional Euclidean space and general metric space is Theta(log k) and it remains a major open problem whether a constant factor exists.
We show that there exists a (3+epsilon)-approximation algorithm for the capacitated k-median and a (9+epsilon)-approximation algorithm for the capacitated k-means problem in general metric spaces whose running times are f(epsilon,k) n^{O(1)}. For Euclidean inputs of arbitrary dimension, we give a (1+epsilon)-approximation algorithm for both problems with a similar running time. This is a significant improvement over the (7+epsilon)-approximation of Adamczyk et al. for k-median in general metric spaces and the (69+epsilon)-approximation of Xu et al. for Euclidean k-means
Multivariate Analysis of Clustering Problems with Constraints
Doktorgradsavhandlin
FPT Constant-Approximations for Capacitated Clustering to Minimize the Sum of Cluster Radii
Clustering with capacity constraints is a fundamental problem that attracted
significant attention throughout the years. In this paper, we give the first
FPT constant-factor approximation algorithm for the problem of clustering
points in a general metric into clusters to minimize the sum of cluster
radii, subject to non-uniform hard capacity constraints. In particular, we give
a -approximation algorithm that runs in time. When capacities are uniform, we obtain the following improved
approximation bounds: A (4 + )-approximation with running time
, which significantly improves over the FPT
28-approximation of Inamdar and Varadarajan [ESA 2020]; a (2 +
)-approximation with running time and a -approximation with running
time in the Euclidean space; and a (1 +
)-approximation in the Euclidean space with running time
if we are allowed to violate
the capacities by (1 + )-factor. We complement this result by showing
that there is no (1 + )-approximation algorithm running in time
, if any capacity violation is not allowed.Comment: Full version of a paper accepted to SoCG 202
On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications
Fair clustering is a constrained variant of clustering where the goal is to
partition a set of colored points, such that the fraction of points of any
color in every cluster is more or less equal to the fraction of points of this
color in the dataset. This variant was recently introduced by Chierichetti et
al. [NeurIPS, 2017] in a seminal work and became widely popular in the
clustering literature. In this paper, we propose a new construction of coresets
for fair clustering based on random sampling. The new construction allows us to
obtain the first coreset for fair clustering in general metric spaces. For
Euclidean spaces, we obtain the first coreset whose size does not depend
exponentially on the dimension. Our coreset results solve open questions
proposed by Schmidt et al. [WAOA, 2019] and Huang et al. [NeurIPS, 2019]. The
new coreset construction helps to design several new approximation and
streaming algorithms. In particular, we obtain the first true
constant-approximation algorithm for metric fair clustering, whose running time
is fixed-parameter tractable (FPT). In the Euclidean case, we derive the first
-approximation algorithm for fair clustering whose time
complexity is near-linear and does not depend exponentially on the dimension of
the space. Besides, our coreset construction scheme is fairly general and gives
rise to coresets for a wide range of constrained clustering problems. This
leads to improved constant-approximations for these problems in general metrics
and near-linear time -approximations in the Euclidean metric
A Survey on Approximation in Parameterized Complexity: Hardness and Algorithms
Parameterization and approximation are two popular ways of coping with
NP-hard problems. More recently, the two have also been combined to derive many
interesting results. We survey developments in the area both from the
algorithmic and hardness perspectives, with emphasis on new techniques and
potential future research directions
- …