Search CORE

400 research outputs found

Constant Factor Approximation for Capacitated k-Center with Outliers

Author: Cygan Marek
Kociumaka Tomasz
Publication venue
Publication date: 01/01/2014
Field of study

The

k

-center problem is a classic facility location problem, where given an edge-weighted graph

G = (V,E)

one is to find a subset of

k

vertices

S

, such that each vertex in

V

is "close" to some vertex in

S

. The approximation status of this basic problem is well understood, as a simple 2-approximation algorithm is known to be tight. Consequently different extensions were studied. In the capacitated version of the problem each vertex is assigned a capacity, which is a strict upper bound on the number of clients a facility can serve, when located at this vertex. A constant factor approximation for the capacitated

k

-center was obtained last year by Cygan, Hajiaghayi and Khuller [FOCS'12], which was recently improved to a 9-approximation by An, Bhaskara and Svensson [arXiv'13]. In a different generalization of the problem some clients (denoted as outliers) may be disregarded. Here we are additionally given an integer

p

and the goal is to serve exactly

p

clients, which the algorithm is free to choose. In 2001 Charikar et al. [SODA'01] presented a 3-approximation for the

k

-center problem with outliers. In this paper we consider a common generalization of the two extensions previously studied separately, i.e. we work with the capacitated

k

-center with outliers. We present the first constant factor approximation algorithm with approximation ratio of 25 even for the case of non-uniform hard capacities.Comment: 15 pages, 3 figures, accepted to STACS 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Capacitated Center Problems with Two-Sided Bounds and Outliers

Author: CG Fernandes
DS Hochbaum
DZ Chen
G Aggarwal
HC An
J Barilan
J Li
K Jain
L Sweeney
M Charikar
MR Korupolu
S Guha
S Khuller
S Li
S Li
TF Gonzalez
V Arya
Publication venue
Publication date: 23/02/2017
Field of study

In recent years, the capacitated center problems have attracted a lot of research interest. Given a set of vertices

V

, we want to find a subset of vertices

S

, called centers, such that the maximum cluster radius is minimized. Moreover, each center in

S

should satisfy some capacity constraint, which could be an upper or lower bound on the number of vertices it can serve. Capacitated

k

-center problems with one-sided bounds (upper or lower) have been well studied in previous work, and a constant factor approximation was obtained. We are the first to study the capacitated center problem with both capacity lower and upper bounds (with or without outliers). We assume each vertex has a uniform lower bound and a non-uniform upper bound. For the case of opening exactly

k

centers, we note that a generalization of a recent LP approach can achieve constant factor approximation algorithms for our problems. Our main contribution is a simple combinatorial algorithm for the case where there is no cardinality constraint on the number of open centers. Our combinatorial algorithm is simpler and achieves better constant approximation factor compared to the LP approach

arXiv.org e-Print Archive

Crossref

FPT Approximations for Capacitated/Fair Clustering with Outliers

Author: Dabas Rajni
Gupta Neelima
Inamdar Tanmay
Publication venue
Publication date: 02/05/2023
Field of study

Clustering problems such as

k

-Median, and

k

-Means, are motivated from applications such as location planning, unsupervised learning among others. In such applications, it is important to find the clustering of points that is not ``skewed'' in terms of the number of points, i.e., no cluster should contain too many points. This is modeled by capacity constraints on the sizes of clusters. In an orthogonal direction, another important consideration in clustering is how to handle the presence of outliers in the data. Indeed, these clustering problems have been generalized in the literature to separately handle capacity constraints and outliers. To the best of our knowledge, there has been very little work on studying the approximability of clustering problems that can simultaneously handle both capacities and outliers. We initiate the study of the Capacitated

k

-Median with Outliers (C

k

MO) problem. Here, we want to cluster all except

m

outlier points into at most

k

clusters, such that (i) the clusters respect the capacity constraints, and (ii) the cost of clustering, defined as the sum of distances of each non-outlier point to its assigned cluster-center, is minimized. We design the first constant-factor approximation algorithms for C

k

MO. In particular, our algorithm returns a (3+\epsilon)-approximation for C

k

MO in general metric spaces, and a (1+\epsilon)-approximation in Euclidean spaces of constant dimension, that runs in time in time

f(k, m, \epsilon) \cdot |I_m|^{O(1)}

, where

|I_m|

denotes the input size. We can also extend these results to a broader class of problems, including Capacitated k-Means/k-Facility Location with Outliers, and Size-Balanced Fair Clustering problems with Outliers. For each of these problems, we obtain an approximation ratio that matches the best known guarantee of the corresponding outlier-free problem.Comment: Abstract shortened to meet arxiv requirement

arXiv.org e-Print Archive

Approximation algorithms for clustering and facility location problems

Author: Gupta Shalmoli
Publication venue
Publication date: 01/12/2018
Field of study

In this thesis we design and analyze algorithms for various facility location and clustering problems. The problems we study are NP-Hard and therefore, assuming P is not equal NP, there do not exist polynomial time algorithms to solve them optimally. One approach to cope with the intractability of these problems is to design approximation algorithms which run in polynomial-time and output a near-optimal solution for all instances of the problem. However these algorithms do not always work well in practice. Often heuristics with no explicit approximation guarantee perform quite well. To bridge this gap between theory and practice, and to design algorithms that are tuned for instances arising in practice, there is an increasing emphasis on beyond worst-case analysis. In this thesis we consider both these approaches. In the first part we design worst case approximation algorithms for Uniform Submodular Facility Location (USFL), and Capacitated k-center (CapKCenter) problems. USFL is a generalization of the well-known Uncapacitated Facility Location problem. In USFL the cost of opening a facility is a submodular function of the clients assigned to it (the function is identical for all facilities). We show that a natural greedy algorithm (which gives constant factor approximation for Uncapacitated Facility Location and other facility location problems) has a lower bound of log(n), where n is the number of clients. We present an O(log^2 k) approximation algorithm where k is the number of facilities. The algorithm is based on rounding a convex relaxation. We further consider several special cases of the problem and give improved approximation bounds for them. The CapKCenter problem is an extension of the well-known k-center problem: each facility has a maximum capacity on the number of clients that can be assigned to it. We obtain a 9-approximation for this problem via a linear programming (LP) rounding procedure. Our result, combined with previously known lower bounds, almost settles the integrality gap for a natural LP relaxation. In the second part we consider several well-known clustering problems like k-center, k-median, k-means and their corresponding outlier variants. We use beyond worst-case analysis due to the practical relevance of these problems. In particular we show that when the input instances are 2-perturbation resilient (i.e. the optimal solution does not change when the distances change by a multiplicative factor of 2), the LP integrality gap for k-center (and also asymmetric k-center) is 1. We further introduce a model of perturbation resilience for clustering with outliers. Under this new model, we show that previous results (including our LP integrality result) known for clustering under perturbation resilience also extend for clustering with outliers. This leads to a dynamic programming based heuristic for k-means with outliers (k-means-outlier) which gives an optimal solution when the instance is 2-perturbation resilient. We propose two more algorithms for k-means-outlier — a sampling based algorithm which gives an O(1) approximation when the optimal clusters are not “too small”, and an LP rounding algorithm which gives an O(1) approximation at the expense of violating the number of clusters and outliers by a small constant. We empirically study our proposed algorithms on several clustering datasets

Illinois Digital Environment for Access to Learning and Scholarship Repository

On the Cost of Essentially Fair Clusterings

Author: Bercea Ioana O.
Khuller Samir
Kumar Aounon
Schmidt Daniel R.
Schmidt Melanie
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 26/11/2018
Field of study

Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair

k

-center problem and a

O(t)

-approximation for the fair

k

-median problem, where

t

is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair

k

-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair

k

-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives

k

-center,

k

-supplier,

k

-median,

k

-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

Dagstuhl Research Online Publication Server

Privacy Preserving Clustering with Constraints

Author: Schmidt Melanie
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Publication date: 01/01/2018
Field of study

The k-center problem is a classical combinatorial optimization problem which asks to find k centers such that the maximum distance of any input point in a set P to its assigned center is minimized. The problem allows for elegant 2-approximations. However, the situation becomes significantly more difficult when constraints are added to the problem. We raise the question whether general methods can be derived to turn an approximation algorithm for a clustering problem with some constraints into an approximation algorithm that respects one constraint more. Our constraint of choice is privacy: Here, we are asked to only open a center when at least l clients will be assigned to it. We show how to combine privacy with several other constraints

Dagstuhl Research Online Publication Server