172 research outputs found

    Approximation Algorithms for Continuous Clustering and Facility Location Problems

    Get PDF
    We consider the approximability of center-based clustering problems where the points to be clustered lie in a metric space, and no candidate centers are specified. We call such problems "continuous", to distinguish from "discrete" clustering where candidate centers are specified. For many objectives, one can reduce the continuous case to the discrete case, and use an α\alpha-approximation algorithm for the discrete case to get a βα\beta\alpha-approximation for the continuous case, where β\beta depends on the objective: e.g. for kk-median, β=2\beta = 2, and for kk-means, β=4\beta = 4. Our motivating question is whether this gap of β\beta is inherent, or are there better algorithms for continuous clustering than simply reducing to the discrete case? In a recent SODA 2021 paper, Cohen-Addad, Karthik, and Lee prove a factor-22 and a factor-44 hardness, respectively, for continuous kk-median and kk-means, even when the number of centers kk is a constant. The discrete case for a constant kk is exactly solvable in polytime, so the β\beta loss seems unavoidable in some regimes. In this paper, we approach continuous clustering via the round-or-cut framework. For four continuous clustering problems, we outperform the reduction to the discrete case. Notably, for the problem λ\lambda-UFL, where β=2\beta = 2 and the discrete case has a hardness of 1.271.27, we obtain an approximation ratio of 2.32<2×1.272.32 < 2 \times 1.27 for the continuous case. Also, for continuous kk-means, where the best known approximation ratio for the discrete case is 99, we obtain an approximation ratio of 32<4×932 < 4 \times 9. The key challenge is that most algorithms for discrete clustering, including the state of the art, depend on linear programs that become infinite-sized in the continuous case. To overcome this, we design new linear programs for the continuous case which are amenable to the round-or-cut framework.Comment: 24 pages, 0 figures. Full version of ESA 2022 paper https://drops.dagstuhl.de/opus/volltexte/2022/16971 . This version adds a link to the conference version and fixes minor formatting issue

    Algoritmos de aproximação para problemas de alocação de instalações e outros problemas de cadeia de fornecimento

    Get PDF
    Orientadores: Flávio Keidi Miyazawa, Maxim SviridenkoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O resumo poderá ser visualizado no texto completo da tese digitalAbstract: The abstract is available with the full electronic documentDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Constrained Clustering Problems and Parity Games

    Get PDF
    Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. We study several clustering objectives. We begin with studying the Euclidean k-center problem. The k-center problem is a classical combinatorial optimization problem which asks to select k centers and assign each input point in a set P to one of the centers, such that the maximum distance of any input point to its assigned center is minimized. The Euclidean k-center problem assumes that the input set P is a subset of a Euclidean space and that each location in the Euclidean space can be chosen as a center. We focus on the special case with k = 1, the smallest enclosing ball problem: given a set of points in m-dimensional Euclidean space, find the smallest sphere enclosing all the points. We combine known results about convex optimization with structural properties of the smallest enclosing ball to create a new algorithm. We show that on instances with rational coefficients our new algorithm computes the exact center of the optimal solutions and has a worst-case run time that is polynomial in the size of the input. We use the new algorithm to show that we can solve the Euclidean k-center problem in polynomial time for constant k and dimension m. The general unconstrained clustering problems are mostly very well studied. The k-center problem for example allows for elegant 2-approximation algorithms(Gonzalez 1985, Hochbaum,Shmoys 1986). However, the situation becomes significantly more difficult when constraints are added to the problem. We first look at the fair clustering. The fairness constraint is motivated by the fact that the general process of computing a clustering may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation algorithm for the fair k-center problem and an O(t)-approximation algorithm for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation algorithm for fair k-center (Rösner, Schmidt 2018). We extend and improve the known results. Firstly, we give a 5-approximation algorithm for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximation algorithms for the fair version of all of the classical clustering objectives (k-center, k-supplier, k-median, k-means and facility location). The latter approximation algorithms are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where for example the centers are already fixed. The second clustering constraint we study is privacy: Here, we are asked to only open a center when at least l points will be assigned to it. We raise the question whether a general method can be derived to turn an approximation algorithm for a clustering problem with some constraints into an approximation algorithm that additionally respects privacy. We show how to combine privacy with several other constraints and obtain approximation algorithms for the k-center problem with several combinations of constraints. In this dissertation we also study parity games, a two player game played on a directed graph. We study the case in which one of the two players controls only a small number k of nodes and the other player controls the n-k other nodes of the game. Our main result is a fixed-parameter-tractable algorithm that solves bipartite parity games in time k^{O(sqrt{k})} O(n^3), and general parity games in time (p+k)^{O(sqrt{k})} O(pnm), where p is the number of distinct priorities and m is the number of edges. For all games with k = o(n) this improves the previously fastest algorithm by Jurdziński, Paterson, and Zwick (2008). We also obtain novel kernelization results and an improved deterministic algorithm for parity games on graphs with small average node-degree

    Approximation algorithms for clustering and facility location problems

    Get PDF
    In this thesis we design and analyze algorithms for various facility location and clustering problems. The problems we study are NP-Hard and therefore, assuming P is not equal NP, there do not exist polynomial time algorithms to solve them optimally. One approach to cope with the intractability of these problems is to design approximation algorithms which run in polynomial-time and output a near-optimal solution for all instances of the problem. However these algorithms do not always work well in practice. Often heuristics with no explicit approximation guarantee perform quite well. To bridge this gap between theory and practice, and to design algorithms that are tuned for instances arising in practice, there is an increasing emphasis on beyond worst-case analysis. In this thesis we consider both these approaches. In the first part we design worst case approximation algorithms for Uniform Submodular Facility Location (USFL), and Capacitated k-center (CapKCenter) problems. USFL is a generalization of the well-known Uncapacitated Facility Location problem. In USFL the cost of opening a facility is a submodular function of the clients assigned to it (the function is identical for all facilities). We show that a natural greedy algorithm (which gives constant factor approximation for Uncapacitated Facility Location and other facility location problems) has a lower bound of log(n), where n is the number of clients. We present an O(log^2 k) approximation algorithm where k is the number of facilities. The algorithm is based on rounding a convex relaxation. We further consider several special cases of the problem and give improved approximation bounds for them. The CapKCenter problem is an extension of the well-known k-center problem: each facility has a maximum capacity on the number of clients that can be assigned to it. We obtain a 9-approximation for this problem via a linear programming (LP) rounding procedure. Our result, combined with previously known lower bounds, almost settles the integrality gap for a natural LP relaxation. In the second part we consider several well-known clustering problems like k-center, k-median, k-means and their corresponding outlier variants. We use beyond worst-case analysis due to the practical relevance of these problems. In particular we show that when the input instances are 2-perturbation resilient (i.e. the optimal solution does not change when the distances change by a multiplicative factor of 2), the LP integrality gap for k-center (and also asymmetric k-center) is 1. We further introduce a model of perturbation resilience for clustering with outliers. Under this new model, we show that previous results (including our LP integrality result) known for clustering under perturbation resilience also extend for clustering with outliers. This leads to a dynamic programming based heuristic for k-means with outliers (k-means-outlier) which gives an optimal solution when the instance is 2-perturbation resilient. We propose two more algorithms for k-means-outlier — a sampling based algorithm which gives an O(1) approximation when the optimal clusters are not “too small”, and an LP rounding algorithm which gives an O(1) approximation at the expense of violating the number of clusters and outliers by a small constant. We empirically study our proposed algorithms on several clustering datasets

    Iterative restricted space search : a solving approach based on hybridization

    Get PDF
    Face à la complexité qui caractérise les problèmes d'optimisation de grande taille l'exploration complète de l'espace des solutions devient rapidement un objectif inaccessible. En effet, à mesure que la taille des problèmes augmente, des méthodes de solution de plus en plus sophistiquées sont exigées afin d'assurer un certain niveau d 'efficacité. Ceci a amené une grande partie de la communauté scientifique vers le développement d'outils spécifiques pour la résolution de problèmes de grande taille tels que les méthodes hybrides. Cependant, malgré les efforts consentis dans le développement d'approches hybrides, la majorité des travaux se sont concentrés sur l'adaptation de deux ou plusieurs méthodes spécifiques, en compensant les points faibles des unes par les points forts des autres ou bien en les adaptant afin de collaborer ensemble. Au meilleur de notre connaissance, aucun travail à date n'à été effectué pour développer un cadre conceptuel pour la résolution efficace de problèmes d'optimisation de grande taille, qui soit à la fois flexible, basé sur l'échange d'information et indépendant des méthodes qui le composent. L'objectif de cette thèse est d'explorer cette avenue de recherche en proposant un cadre conceptuel pour les méthodes hybrides, intitulé la recherche itérative de l'espace restreint, ±Iterative Restricted Space Search (IRSS)>>, dont, la principale idée est la définition et l'exploration successives de régions restreintes de l'espace de solutions. Ces régions, qui contiennent de bonnes solutions et qui sont assez petites pour être complètement explorées, sont appelées espaces restreints "Restricted Spaces (RS)". Ainsi, l'IRSS est une approche de solution générique, basée sur l'interaction de deux phases algorithmiques ayant des objectifs complémentaires. La première phase consiste à identifier une région restreinte intéressante et la deuxième phase consiste à l'explorer. Le schéma hybride de l'approche de solution permet d'alterner entre les deux phases pour un nombre fixe d'itérations ou jusqu'à l'atteinte d'une certaine limite de temps. Les concepts clés associées au développement de ce cadre conceptuel et leur validation seront introduits et validés graduellement dans cette thèse. Ils sont présentés de manière à permettre au lecteur de comprendre les problèmes que nous avons rencontrés en cours de développement et comment les solutions ont été conçues et implémentées. À cette fin, la thèse a été divisée en quatre parties. La première est consacrée à la synthèse de l'état de l'art dans le domaine de recherche sur les méthodes hybrides. Elle présente les principales approches hybrides développées et leurs applications. Une brève description des approches utilisant le concept de restriction d'espace est aussi présentée dans cette partie. La deuxième partie présente les concepts clés de ce cadre conceptuel. Il s'agit du processus d'identification des régions restreintes et des deux phases de recherche. Ces concepts sont mis en oeuvre dans un schéma hybride heuristique et méthode exacte. L'approche a été appliquée à un problème d'ordonnancement avec deux niveaux de décision, relié au contexte des pâtes et papier: "Pulp Production Scheduling Problem". La troisième partie a permit d'approfondir les concepts développés et ajuster les limitations identifiées dans la deuxième partie, en proposant une recherche itérative appliquée pour l'exploration de RS de grande taille et une structure en arbre binaire pour l'exploration de plusieurs RS. Cette structure a l'avantage d'éviter l'exploration d 'un espace déjà exploré précédemment tout en assurant une diversification naturelle à la méthode. Cette extension de la méthode a été testée sur un problème de localisation et d'allocation en utilisant un schéma d'hybridation heuristique-exact de manière itérative. La quatrième partie généralise les concepts préalablement développés et conçoit un cadre général qui est flexible, indépendant des méthodes utilisées et basé sur un échange d'informations entre les phases. Ce cadre a l'avantage d'être général et pourrait être appliqué à une large gamme de problèmes
    • …
    corecore