    Hardness of Approximation for Euclidean k-Median

    The Euclidean k-median problem is defined in the following manner: given a set ? of n points in d-dimensional Euclidean space ?^d, and an integer k, find a set C ? ?^d of k points (called centers) such that the cost function ?(C,?) ? ?_{x ? ?} min_{c ? C} ?x-c?? is minimized. The Euclidean k-means problem is defined similarly by replacing the distance with squared Euclidean distance in the cost function. Various hardness of approximation results are known for the Euclidean k-means problem [Pranjal Awasthi et al., 2015; Euiwoong Lee et al., 2017; Vincent Cohen{-}Addad and {Karthik {C. S.}}, 2019]. However, no hardness of approximation result was known for the Euclidean k-median problem. In this work, assuming the unique games conjecture (UGC), we provide the hardness of approximation result for the Euclidean k-median problem in O(log k) dimensional space. This solves an open question posed explicitly in the work of Awasthi et al. [Pranjal Awasthi et al., 2015]. Furthermore, we study the hardness of approximation for the Euclidean k-means/k-median problems in the bi-criteria setting where an algorithm is allowed to choose more than k centers. That is, bi-criteria approximation algorithms are allowed to output ? k centers (for constant ? > 1) and the approximation ratio is computed with respect to the optimal k-means/k-median cost. We show the hardness of bi-criteria approximation result for the Euclidean k-median problem for any ? < 1.015, assuming UGC. We also show a similar hardness of bi-criteria approximation result for the Euclidean k-means problem with a stronger bound of ? < 1.28, again assuming UGC

    The one-round Voronoi game replayed

    We consider the one-round Voronoi game, where player one (``White'', called ``Wilma'') places a set of n points in a rectangular area of aspect ratio r <=1, followed by the second player (``Black'', called ``Barney''), who places the same number of points. Each player wins the fraction of the board closest to one of his points, and the goal is to win more than half of the total area. This problem has been studied by Cheong et al., who showed that for large enough nn and r=1, Barney has a strategy that guarantees a fraction of 1/2+a, for some small fixed a. We resolve a number of open problems raised by that paper. In particular, we give a precise characterization of the outcome of the game for optimal play: We show that Barney has a winning strategy for n>2 and r>sqrt{2}/n, and for n=2 and r>sqrt{3}/2. Wilma wins in all remaining cases, i.e., for n>=3 and r<=sqrt{2}/n, for n=2 and r<=sqrt{3}/2, and for n=1. We also discuss complexity aspects of the game on more general boards, by proving that for a polygon with holes, it is NP-hard to maximize the area Barney can win against a given set of points by Wilma.Comment: 14 pages, 6 figures, Latex; revised for journal version, to appear in Computational Geometry: Theory and Applications. Extended abstract version appeared in Workshop on Algorithms and Data Structures, Springer Lecture Notes in Computer Science, vol.2748, 2003, pp. 150-16

    Integer Point Sets Minimizing Average Pairwise L1-Distance: What is the Optimal Shape of a Town?

    An n-town, for a natural number n, is a group of n buildings, each occupying a distinct position on a 2-dimensional integer grid. If we measure the distance between two buildings along the axis-parallel street grid, then an n-town has optimal shape if the sum of all pairwise Manhattan distances is minimized. This problem has been studied for cities, i.e., the limiting case of very large n. For cities, it is known that the optimal shape can be described by a differential equation, for which no closed-form is known. We show that optimal n-towns can be computed in O(n^7.5) time. This is also practically useful, as it allows us to compute optimal solutions up to n=80.Comment: 26 pages, 6 figures, to appear in Computational Geometry: Theory and Application

    Solution Methods for the \u3cem\u3ep\u3c/em\u3e-Median Problem: An Annotated Bibliography

    The p-median problem is a graph theory problem that was originally designed for, and has been extensively applied to, facility location. In this bibliography, we summarize the literature on solution methods for the uncapacitated and capacitated p-median problem on a graph or network

    A Scalable Algorithm for Metric High-Quality Clustering in Information Retrieval Tasks

    We consider the problem of finding efficiently a high quality k-clustering of n points in a (possibly discrete) metric space. Many methods are known when the point are vectors in a real vector space, and the distance function is a standard geometric distance such as L1, L2 (Euclidean) or L2 2 (squared Euclidean distance). In such cases efficiency is often sought via sophisticated multidimensional search structures for speeding up nearest neighbor queries (e.g. variants of kd-trees). Such techniques usually work well in spaces of moderately high dimension say up to 6 or 8). Our target is a scenario in which either the metric space cannot be mapped into a vector space, or, if this mapping is possible, the dimension of such a space is so high as to rule out the use of the above mentioned techniques. This setting is rather typical in Information Retrieval applications. We augment the well known furthest-point-first algorithm for kcenter clustering in metric spaces with a filtering step based on the triangular inequality and we compare this algorithm with some recent fast variants of the classical k-means iterative algorithm augmented with an analogous filtering schemes. We extensively tested the two solutions on synthetic geometric data and real data from Information Retrieval applications. The main conclusion we draw is that our modified furthest-point-first method attains solutions of better or comparable quality within a fraction of the time used by the fast k-means algorithm. Thus our algorithm is valuable when either real time constraints or the large amount of data highlight the poor scalability of traditional clustering methods
