Search CORE

2,364 research outputs found

Constant Approximation for $k$ -Median and $k$ -Means with Outliers via Iterative Rounding

Author: Arthur David
Charikar M.
Chawla Sanjay
Chen Ke
Cohen-Addad Vincent
Guha Sudipto
Korupolu Madhukar R.
Ott Lionel
Shi Li. A
Publication venue
Publication date: 06/04/2018
Field of study

In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an

(\alpha_1 + \epsilon \leq 7.081 + \epsilon)

-approximation algorithm for

k

-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen [Chen, SODA 2018]. For

k

-means with outliers, we give an

(\alpha_2+\epsilon \leq 53.002 + \epsilon)

-approximation, which is the first

O(1)

-approximation for this problem. The iterative algorithm framework is very versatile; we show how it can be used to give

\alpha_1

- and

(\alpha_1 + \epsilon)

-approximation algorithms for matroid and knapsack median problems respectively, improving upon the previous best approximations ratios of

8

[Swamy, ACM Trans. Algorithms] and

17.46

[Byrka et al, ESA 2015]. The natural LP relaxation for the

k

-median/

k

-means with outliers problem has an unbounded integrality gap. In spite of this negative result, our iterative rounding framework shows that we can round an LP solution to an almost-integral solution of small cost, in which we have at most two fractionally open facilities. Thus, the LP integrality gap arises due to the gap between almost-integral and fully-integral solutions. Then, using a pre-processing procedure, we show how to convert an almost-integral solution to a fully-integral solution losing only a constant-factor in the approximation ratio. By further using a sparsification technique, the additive factor loss incurred by the conversion can be reduced to any

\epsilon > 0

arXiv.org e-Print Archive

Crossref

A Constant Approximation for Colorful k-Center

Author: Bandyapadhyay Sayan
Inamdar Tanmay
Pai Shreyas
Varadarajan Kasturi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

In this paper, we consider the colorful k-center problem, which is a generalization of the well-known k-center problem. Here, we are given red and blue points in a metric space, and a coverage requirement for each color. The goal is to find the smallest radius rho, such that with k balls of radius rho, the desired number of points of each color can be covered. We obtain a constant approximation for this problem in the Euclidean plane. We obtain this result by combining a "pseudo-approximation" algorithm that works in any metric space, and an approximation algorithm that works for a special class of instances in the plane. The latter algorithm uses a novel connection to a certain matching problem in graphs

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Structural Iterative Rounding for Generalized k-Median Problems

Author: Gupta Anupam
Moseley Benjamin
Zhou Rudy
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Publication date: 02/09/2020
Field of study

This paper considers approximation algorithms for generalized k-median problems. This class of problems can be informally described as k-median with a constant number of extra constraints, and includes k-median with outliers, and knapsack median. Our first contribution is a pseudo-approximation algorithm for generalized k-median that outputs a 6.387-approximate solution with a constant number of fractional variables. The algorithm is based on iteratively rounding linear programs, and the main technical innovation comes from understanding the rich structure of the resulting extreme points. Using our pseudo-approximation algorithm, we give improved approximation algorithms for k-median with outliers and knapsack median. This involves combining our pseudo-approximation with pre- and post-processing steps to round a constant number of fractional variables at a small increase in cost. Our algorithms achieve approximation ratios 6.994 + ? and 6.387 + ? for k-median with outliers and knapsack median, respectively. These both improve on the best known approximations

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Approximation algorithms for stochastic clustering

Author: Harris David G.
Li Shi
Pensyl Thomas
Srinivasan Aravind
Trinh Khoa
Publication venue
Publication date: 10/09/2019
Field of study

We consider stochastic settings for clustering, and develop provably-good approximation algorithms for a number of these notions. These algorithms yield better approximation ratios compared to the usual deterministic clustering setting. Additionally, they offer a number of advantages including clustering which is fairer and has better long-term behavior for each user. In particular, they ensure that *every user* is guaranteed to get good service (on average). We also complement some of these with impossibility results

arXiv.org e-Print Archive

Robust Correlation Clustering

Author: Devvrit
Krishnaswamy Ravishankar
Rajaraman Nived
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

In this paper, we introduce and study the Robust-Correlation-Clustering problem: given a graph G = (V,E) where every edge is either labeled + or - (denoting similar or dissimilar pairs of vertices), and a parameter m, the goal is to delete a set D of m vertices, and partition the remaining vertices V D into clusters to minimize the cost of the clustering, which is the sum of the number of + edges with end-points in different clusters and the number of - edges with end-points in the same cluster. This generalizes the classical Correlation-Clustering problem which is the special case when m = 0. Correlation clustering is useful when we have (only) qualitative information about the similarity or dissimilarity of pairs of points, and Robust-Correlation-Clustering equips this model with the capability to handle noise in datasets. In this work, we present a constant-factor bi-criteria algorithm for Robust-Correlation-Clustering on complete graphs (where our solution is O(1)-approximate w.r.t the cost while however discarding O(1) m points as outliers), and also complement this by showing that no finite approximation is possible if we do not violate the outlier budget. Our algorithm is very simple in that it first does a simple LP-based pre-processing to delete O(m) vertices, and subsequently runs a particular Correlation-Clustering algorithm ACNAlg [Ailon et al., 2005] on the residual instance. We then consider general graphs, and show (O(log n), O(log^2 n)) bi-criteria algorithms while also showing a hardness of alpha_MC on both the cost and the outlier violation, where alpha_MC is the lower bound for the Minimum-Multicut problem

Dagstuhl Research Online Publication Server

On the Cost of Essentially Fair Clusterings

Author: Bercea Ioana O.
Khuller Samir
Kumar Aounon
Schmidt Daniel R.
Schmidt Melanie
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 26/11/2018
Field of study

Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair

k

-center problem and a

O(t)

-approximation for the fair

k

-median problem, where

t

is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair

k

-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair

k

-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives

k

-center,

k

-supplier,

k

-median,

k

-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

Dagstuhl Research Online Publication Server

Ordered k-Median with Outliers

Author: Deng Shichuan
Zhang Qianfan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)
Publication date: 01/01/2022
Field of study

We study a natural generalization of the celebrated ordered k-median problem, named robust ordered k-median, also known as ordered k-median with outliers. We are given facilities ? and clients ? in a metric space (???,d), parameters k,m ? ?_+ and a non-increasing non-negative vector w ? ?_+^m. We seek to open k facilities F ? ? and serve m clients C ? ?, inducing a service cost vector c = {d(j,F):j ? C}; the goal is to minimize the ordered objective w^?c^?, where d(j,F) = min_{i ? F}d(j,i) is the minimum distance between client j and facilities in F, and c^? ? ?_+^m is the non-increasingly sorted version of c. Robust ordered k-median captures many interesting clustering problems recently studied in the literature, e.g., robust k-median, ordered k-median, etc. We obtain the first polynomial-time constant-factor approximation algorithm for robust ordered k-median, achieving an approximation guarantee of 127. The main difficulty comes from the presence of outliers, which already causes an unbounded integrality gap in the natural LP relaxation for robust k-median. This appears to invalidate previous methods in approximating the highly non-linear ordered objective. To overcome this issue, we introduce a novel yet very simple reduction framework that enables linear analysis of the non-linear objective. We also devise the first constant-factor approximations for ordered matroid median and ordered knapsack median using the same framework, and the approximation factors are 19.8 and 41.6, respectively

Dagstuhl Research Online Publication Server