2,364 research outputs found

    Constant Approximation for kk-Median and kk-Means with Outliers via Iterative Rounding

    Full text link
    In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an (α1+ϵ≤7.081+ϵ)(\alpha_1 + \epsilon \leq 7.081 + \epsilon)-approximation algorithm for kk-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen [Chen, SODA 2018]. For kk-means with outliers, we give an (α2+ϵ≤53.002+ϵ)(\alpha_2+\epsilon \leq 53.002 + \epsilon)-approximation, which is the first O(1)O(1)-approximation for this problem. The iterative algorithm framework is very versatile; we show how it can be used to give α1\alpha_1- and (α1+ϵ)(\alpha_1 + \epsilon)-approximation algorithms for matroid and knapsack median problems respectively, improving upon the previous best approximations ratios of 88 [Swamy, ACM Trans. Algorithms] and 17.4617.46 [Byrka et al, ESA 2015]. The natural LP relaxation for the kk-median/kk-means with outliers problem has an unbounded integrality gap. In spite of this negative result, our iterative rounding framework shows that we can round an LP solution to an almost-integral solution of small cost, in which we have at most two fractionally open facilities. Thus, the LP integrality gap arises due to the gap between almost-integral and fully-integral solutions. Then, using a pre-processing procedure, we show how to convert an almost-integral solution to a fully-integral solution losing only a constant-factor in the approximation ratio. By further using a sparsification technique, the additive factor loss incurred by the conversion can be reduced to any ϵ>0\epsilon > 0

    A Constant Approximation for Colorful k-Center

    Get PDF
    In this paper, we consider the colorful k-center problem, which is a generalization of the well-known k-center problem. Here, we are given red and blue points in a metric space, and a coverage requirement for each color. The goal is to find the smallest radius rho, such that with k balls of radius rho, the desired number of points of each color can be covered. We obtain a constant approximation for this problem in the Euclidean plane. We obtain this result by combining a "pseudo-approximation" algorithm that works in any metric space, and an approximation algorithm that works for a special class of instances in the plane. The latter algorithm uses a novel connection to a certain matching problem in graphs

    Structural Iterative Rounding for Generalized k-Median Problems

    Get PDF
    This paper considers approximation algorithms for generalized k-median problems. This class of problems can be informally described as k-median with a constant number of extra constraints, and includes k-median with outliers, and knapsack median. Our first contribution is a pseudo-approximation algorithm for generalized k-median that outputs a 6.387-approximate solution with a constant number of fractional variables. The algorithm is based on iteratively rounding linear programs, and the main technical innovation comes from understanding the rich structure of the resulting extreme points. Using our pseudo-approximation algorithm, we give improved approximation algorithms for k-median with outliers and knapsack median. This involves combining our pseudo-approximation with pre- and post-processing steps to round a constant number of fractional variables at a small increase in cost. Our algorithms achieve approximation ratios 6.994 + ? and 6.387 + ? for k-median with outliers and knapsack median, respectively. These both improve on the best known approximations

    Approximation algorithms for stochastic clustering

    Full text link
    We consider stochastic settings for clustering, and develop provably-good approximation algorithms for a number of these notions. These algorithms yield better approximation ratios compared to the usual deterministic clustering setting. Additionally, they offer a number of advantages including clustering which is fairer and has better long-term behavior for each user. In particular, they ensure that *every user* is guaranteed to get good service (on average). We also complement some of these with impossibility results

    Robust Correlation Clustering

    Get PDF
    In this paper, we introduce and study the Robust-Correlation-Clustering problem: given a graph G = (V,E) where every edge is either labeled + or - (denoting similar or dissimilar pairs of vertices), and a parameter m, the goal is to delete a set D of m vertices, and partition the remaining vertices V D into clusters to minimize the cost of the clustering, which is the sum of the number of + edges with end-points in different clusters and the number of - edges with end-points in the same cluster. This generalizes the classical Correlation-Clustering problem which is the special case when m = 0. Correlation clustering is useful when we have (only) qualitative information about the similarity or dissimilarity of pairs of points, and Robust-Correlation-Clustering equips this model with the capability to handle noise in datasets. In this work, we present a constant-factor bi-criteria algorithm for Robust-Correlation-Clustering on complete graphs (where our solution is O(1)-approximate w.r.t the cost while however discarding O(1) m points as outliers), and also complement this by showing that no finite approximation is possible if we do not violate the outlier budget. Our algorithm is very simple in that it first does a simple LP-based pre-processing to delete O(m) vertices, and subsequently runs a particular Correlation-Clustering algorithm ACNAlg [Ailon et al., 2005] on the residual instance. We then consider general graphs, and show (O(log n), O(log^2 n)) bi-criteria algorithms while also showing a hardness of alpha_MC on both the cost and the outlier violation, where alpha_MC is the lower bound for the Minimum-Multicut problem

    On the Cost of Essentially Fair Clusterings

    Get PDF
    Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair kk-center problem and a O(t)O(t)-approximation for the fair kk-median problem, where tt is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair kk-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair kk-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives kk-center, kk-supplier, kk-median, kk-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed

    Ordered k-Median with Outliers

    Get PDF
    We study a natural generalization of the celebrated ordered k-median problem, named robust ordered k-median, also known as ordered k-median with outliers. We are given facilities ? and clients ? in a metric space (???,d), parameters k,m ? ?_+ and a non-increasing non-negative vector w ? ?_+^m. We seek to open k facilities F ? ? and serve m clients C ? ?, inducing a service cost vector c = {d(j,F):j ? C}; the goal is to minimize the ordered objective w^?c^?, where d(j,F) = min_{i ? F}d(j,i) is the minimum distance between client j and facilities in F, and c^? ? ?_+^m is the non-increasingly sorted version of c. Robust ordered k-median captures many interesting clustering problems recently studied in the literature, e.g., robust k-median, ordered k-median, etc. We obtain the first polynomial-time constant-factor approximation algorithm for robust ordered k-median, achieving an approximation guarantee of 127. The main difficulty comes from the presence of outliers, which already causes an unbounded integrality gap in the natural LP relaxation for robust k-median. This appears to invalidate previous methods in approximating the highly non-linear ordered objective. To overcome this issue, we introduce a novel yet very simple reduction framework that enables linear analysis of the non-linear objective. We also devise the first constant-factor approximations for ordered matroid median and ordered knapsack median using the same framework, and the approximation factors are 19.8 and 41.6, respectively
    • …
    corecore