119 research outputs found

    Approximating kk-Median via Pseudo-Approximation

    Full text link
    We present a novel approximation algorithm for kk-median that achieves an approximation guarantee of 1+3+ϵ1+\sqrt{3}+\epsilon, improving upon the decade-old ratio of 3+ϵ3+\epsilon. Our approach is based on two components, each of which, we believe, is of independent interest. First, we show that in order to give an α\alpha-approximation algorithm for kk-median, it is sufficient to give a \emph{pseudo-approximation algorithm} that finds an α\alpha-approximate solution by opening k+O(1)k+O(1) facilities. This is a rather surprising result as there exist instances for which opening k+1k+1 facilities may lead to a significant smaller cost than if only kk facilities were opened. Second, we give such a pseudo-approximation algorithm with α=1+3+ϵ\alpha= 1+\sqrt{3}+\epsilon. Prior to our work, it was not even known whether opening k+o(k)k + o(k) facilities would help improve the approximation ratio.Comment: 18 page

    Fault Tolerant Clustering Revisited

    Full text link
    In discrete k-center and k-median clustering, we are given a set of points P in a metric space M, and the task is to output a set C \subseteq ? P, |C| = k, such that the cost of clustering P using C is as small as possible. For k-center, the cost is the furthest a point has to travel to its nearest center, whereas for k-median, the cost is the sum of all point to nearest center distances. In the fault-tolerant versions of these problems, we are given an additional parameter 1 ?\leq \ell \leq ? k, such that when computing the cost of clustering, points are assigned to their \ell-th nearest-neighbor in C, instead of their nearest neighbor. We provide constant factor approximation algorithms for these problems that are both conceptually simple and highly practical from an implementation stand-point

    Certified Algorithms: Worst-Case Analysis and Beyond

    Get PDF
    In this paper, we introduce the notion of a certified algorithm. Certified algorithms provide worst-case and beyond-worst-case performance guarantees. First, a ?-certified algorithm is also a ?-approximation algorithm - it finds a ?-approximation no matter what the input is. Second, it exactly solves ?-perturbation-resilient instances (?-perturbation-resilient instances model real-life instances). Additionally, certified algorithms have a number of other desirable properties: they solve both maximization and minimization versions of a problem (e.g. Max Cut and Min Uncut), solve weakly perturbation-resilient instances, and solve optimization problems with hard constraints. In the paper, we define certified algorithms, describe their properties, present a framework for designing certified algorithms, provide examples of certified algorithms for Max Cut/Min Uncut, Minimum Multiway Cut, k-medians and k-means. We also present some negative results

    Fair Clustering Through Fairlets

    Get PDF
    We study the question of fair clustering under the {\em disparate impact} doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the kk-center and the kk-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions---for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically quantify the value of fair clustering on real-world datasets with sensitive attributes

    Constant-Factor FPT Approximation for Capacitated k-Median

    Get PDF
    Capacitated k-median is one of the few outstanding optimization problems for which the existence of a polynomial time constant factor approximation algorithm remains an open problem. In a series of recent papers algorithms producing solutions violating either the number of facilities or the capacity by a multiplicative factor were obtained. However, to produce solutions without violations appears to be hard and potentially requires different algorithmic techniques. Notably, if parameterized by the number of facilities k, the problem is also W[2] hard, making the existence of an exact FPT algorithm unlikely. In this work we provide an FPT-time constant factor approximation algorithm preserving both cardinality and capacity of the facilities. The algorithm runs in time 2^O(k log k) n^O(1) and achieves an approximation ratio of 7+epsilon

    The Hardness of Approximation of Euclidean k-means

    Get PDF
    The Euclidean kk-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of nn points in Euclidean space RdR^d, and the goal is to choose kk centers in RdR^d so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general kk and a (1+ϵ)(1+\epsilon)-approximation which runs in time poly(n)2O(k/ϵ)poly(n) 2^{O(k/\epsilon)}. At the other extreme, the only known computational complexity result for this problem is NP-hardness [ADHP'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in RdR^d can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k,dk,d. In this paper we provide the first hardness of approximation for the Euclidean kk-means problem. Concretely, we show that there exists a constant ϵ>0\epsilon > 0 such that it is NP-hard to approximate the kk-means objective to within a factor of (1+ϵ)(1+\epsilon). We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform GG, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph HH, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest

    Constant Factor Approximation for Capacitated k-Center with Outliers

    Get PDF
    The kk-center problem is a classic facility location problem, where given an edge-weighted graph G=(V,E)G = (V,E) one is to find a subset of kk vertices SS, such that each vertex in VV is "close" to some vertex in SS. The approximation status of this basic problem is well understood, as a simple 2-approximation algorithm is known to be tight. Consequently different extensions were studied. In the capacitated version of the problem each vertex is assigned a capacity, which is a strict upper bound on the number of clients a facility can serve, when located at this vertex. A constant factor approximation for the capacitated kk-center was obtained last year by Cygan, Hajiaghayi and Khuller [FOCS'12], which was recently improved to a 9-approximation by An, Bhaskara and Svensson [arXiv'13]. In a different generalization of the problem some clients (denoted as outliers) may be disregarded. Here we are additionally given an integer pp and the goal is to serve exactly pp clients, which the algorithm is free to choose. In 2001 Charikar et al. [SODA'01] presented a 3-approximation for the kk-center problem with outliers. In this paper we consider a common generalization of the two extensions previously studied separately, i.e. we work with the capacitated kk-center with outliers. We present the first constant factor approximation algorithm with approximation ratio of 25 even for the case of non-uniform hard capacities.Comment: 15 pages, 3 figures, accepted to STACS 201

    Tight Analysis of a Multiple-Swap Heuristic for Budgeted Red-Blue Median

    Get PDF
    Budgeted Red-Blue Median is a generalization of classic kk-Median in that there are two sets of facilities, say R\mathcal{R} and B\mathcal{B}, that can be used to serve clients located in some metric space. The goal is to open krk_r facilities in R\mathcal{R} and kbk_b facilities in B\mathcal{B} for some given bounds kr,kbk_r, k_b and connect each client to their nearest open facility in a way that minimizes the total connection cost. We extend work by Hajiaghayi, Khandekar, and Kortsarz [2012] and show that a multiple-swap local search heuristic can be used to obtain a (5+ϵ)(5+\epsilon)-approximation for Budgeted Red-Blue Median for any constant ϵ>0\epsilon > 0. This is an improvement over their single swap analysis and beats the previous best approximation guarantee of 8 by Swamy [2014]. We also present a matching lower bound showing that for every p≥1p \geq 1, there are instances of Budgeted Red-Blue Median with local optimum solutions for the pp-swap heuristic whose cost is 5+Ω(1p)5 + \Omega\left(\frac{1}{p}\right) times the optimum solution cost. Thus, our analysis is tight up to the lower order terms. In particular, for any ϵ>0\epsilon > 0 we show the single-swap heuristic admits local optima whose cost can be as bad as 7−ϵ7-\epsilon times the optimum solution cost
    • …
    corecore