    Approximate Clustering via Metric Partitioning

    In this paper we consider two metric covering/clustering problems - \textit{Minimum Cost Covering Problem} (MCC) and kk-clustering. In the MCC problem, we are given two point sets XX (clients) and YY (servers), and a metric on XâˆȘYX \cup Y. We would like to cover the clients by balls centered at the servers. The objective function to minimize is the sum of the α\alpha-th power of the radii of the balls. Here α≄1\alpha \geq 1 is a parameter of the problem (but not of a problem instance). MCC is closely related to the kk-clustering problem. The main difference between kk-clustering and MCC is that in kk-clustering one needs to select kk balls to cover the clients. For any \eps > 0, we describe quasi-polynomial time (1 + \eps) approximation algorithms for both of the problems. However, in case of kk-clustering the algorithm uses (1 + \eps)k balls. Prior to our work, a 3α3^{\alpha} and a cα{c}^{\alpha} approximation were achieved by polynomial-time algorithms for MCC and kk-clustering, respectively, where c>1c > 1 is an absolute constant. These two problems are thus interesting examples of metric covering/clustering problems that admit (1 + \eps)-approximation (using (1+\eps)k balls in case of kk-clustering), if one is willing to settle for quasi-polynomial time. In contrast, for the variant of MCC where α\alpha is part of the input, we show under standard assumptions that no polynomial time algorithm can achieve an approximation factor better than O(log⁥∣X∣)O(\log |X|) for α≄log⁥∣X∣\alpha \geq \log |X|.Comment: 19 page

    The Bane of Low-Dimensionality Clustering

    In this paper, we give a conditional lower bound of nΩ(k)n^{\Omega(k)} on running time for the classic k-median and k-means clustering objectives (where n is the size of the input), even in low-dimensional Euclidean space of dimension four, assuming the Exponential Time Hypothesis (ETH). We also consider k-median (and k-means) with penalties where each point need not be assigned to a center, in which case it must pay a penalty, and extend our lower bound to at least three-dimensional Euclidean space. This stands in stark contrast to many other geometric problems such as the traveling salesman problem, or computing an independent set of unit spheres. While these problems benefit from the so-called (limited) blessing of dimensionality, as they can be solved in time nO(k1−1/d)n^{O(k^{1-1/d})} or 2n1−1/d2^{n^{1-1/d}} in d dimensions, our work shows that widely-used clustering objectives have a lower bound of nΩ(k)n^{\Omega(k)}, even in dimension four. We complete the picture by considering the two-dimensional case: we show that there is no algorithm that solves the penalized version in time less than no(k)n^{o(\sqrt{k})}, and provide a matching upper bound of nO(k)n^{O(\sqrt{k})}. The main tool we use to establish these lower bounds is the placement of points on the moment curve, which takes its inspiration from constructions of point sets yielding Delaunay complexes of high complexity

    Average Case Network Lifetime on an Interval with Adjustable Sensing Ranges

    Given n sensors on an interval, each of which is equipped with an adjustable sensing radius and a unit battery charge that drains in inverse linear proportion to its radius, what schedule will maximize the lifetime of a network that covers the entire interval? Trivially, any reasonable algorithm is at least a 2-approximation for this Sensor Strip Cover problem, so we focus on developing an efficient algorithm that maximizes the expected network lifetime under a random uniform model of sensor distribution. We demonstrate one such algorithm that achieves an expected network lifetime within 12 % of the theoretical maximum. Most of the algorithms that we consider come from a particular family of RoundRobin coverage, in which sensors take turns covering predefined areas until their battery runs out

    Fast Fencing

    We consider very natural "fence enclosure" problems studied by Capoyleas, Rote, and Woeginger and Arkin, Khuller, and Mitchell in the early 90s. Given a set SS of nn points in the plane, we aim at finding a set of closed curves such that (1) each point is enclosed by a curve and (2) the total length of the curves is minimized. We consider two main variants. In the first variant, we pay a unit cost per curve in addition to the total length of the curves. An equivalent formulation of this version is that we have to enclose nn unit disks, paying only the total length of the enclosing curves. In the other variant, we are allowed to use at most kk closed curves and pay no cost per curve. For the variant with at most kk closed curves, we present an algorithm that is polynomial in both nn and kk. For the variant with unit cost per curve, or unit disks, we present a near-linear time algorithm. Capoyleas, Rote, and Woeginger solved the problem with at most kk curves in nO(k)n^{O(k)} time. Arkin, Khuller, and Mitchell used this to solve the unit cost per curve version in exponential time. At the time, they conjectured that the problem with kk curves is NP-hard for general kk. Our polynomial time algorithm refutes this unless P equals NP

    Connecting a Set of Circles with Minimum Sum of Radii

    Abstract. We consider the problem of assigning radii to a given set of points in the plane, such that the resulting set of circles is connected, and the sum of radii is minimized. We show that the problem is polynomially solvable if a connectivity tree is given. If the connectivity tree is unknown, the problem is NP-hard if there are upper bounds on the radii and open otherwise. We give approximation guarantees for a variety of polynomialtime algorithms, describe upper and lower bounds (which are matching in some of the cases), provide polynomial-time approximation schemes, and conclude with experimental results and open problems

    Approximation Algorithms for Clustering and Facility Location Problems

    Facility location problems arise in a wide range of applications such as plant or warehouse location problems, cache placement problems, and network design problems, and have been widely studied in Computer Science and Operations Research literature. These problems typically involve an underlying set F of facilities that provide service, and an underlying set D of clients that require service, which need to be assigned to facilities in a cost-effective fashion. This abstraction is quite versatile and also captures clustering problems, where one typically seeks to partition a set of data points into k clusters, for some given k, in a suitable way, which themselves find applications in data mining, machine learning, and bioinformatics. Basic variants of facility location problems are now relatively well-u nderstood, but we have much-less understanding of more-sophisticated models that better model the real-world concerns. In this thesis, we focus on three models inspired by some real-world optimization scenarios. In Chapter 2, we consider mobile facility location (MFL) problem, wherein we seek to relocate a given set of facilities to destinations closer to the clients as to minimize the sum of facility-movement and client-assignment costs. This abstracts facility-location settings where one has the flexibility of moving facilities from their current locations to other destinations so as to serve clients more efficiently by reducing their assignment costs. We give the first local-search based approximation algorithm for this problem and achieve the best-known approximation guarantee. Our main result is (3+epsilon)-approximation for this problem for any constant epsilon > 0 using local search which improves the previous best guarantee of 8-approximation algorithm due to [34] based on LP-rounding. Our results extend to the weighted generalization wherein each facility i has a non-negative weight w_i and the movement cost for i is w_i times the distance traveled by i. In Chapter 3, we consider a facility-location problem that we call the minimum-load k-facility location (MLkFL), which abstracts settings where the cost of serving the clients assigned to a facility is incurred by the facility. This problem was studied under the name of min-max star cover in [32,10], who (among other results) gave bicriteria approximation algorithms for MLkFL when F=D. MLkFL is rather poorly understood, and only an O(k)-approximation is currently known for MLkFL, even for line metrics. Our main result is the first polytime approximation scheme (PTAS) for MLkFL on line metrics (note that no non-trivial true approximation of any kind was known for this metric). Complementing this, we prove that MLkFL is strongly NP-hard on line metrics. In Chapter 4, we consider clustering problems with non-uniform lower bounds and outliers, and obtain the first approximation guarantees for these problems. We consider objective functions involving the radii of open facilities, where the radius of a facility i is the maximum distance between i and a client assigned to it. We consider two problems: minimizing the sum of the radii of the open facilities, which yields the lower-bounded min-sum-of-radii with outliers (LBkSRO) problem, and minimizing the maximum radius, which yields the lower-bounded k-supplier with outliers (LBkSupO) problem. We obtain an approximation factor of 12.365 for LBkSRO, which improves to 3.83 for the non-outlier version. These also constitute the first approximation bounds for the min-sum-of-radii objective when we consider lower bounds and outliers separately. We obtain approximation factors of 5 and 3 respectively for LBkSupO and its non-outlier version. These are the first approximation results for k-supplier with non-uniform lower bounds