722,894 research outputs found

    Fully dynamic clustering and diversity maximization in doubling metrics

    Full text link
    We present approximation algorithms for some variants of center-based clustering and related problems in the fully dynamic setting, where the pointset evolves through an arbitrary sequence of insertions and deletions. Specifically, we target the following problems: kk-center (with and without outliers), matroid-center, and diversity maximization. All algorithms employ a coreset-based strategy and rely on the use of the cover tree data structure, which we crucially augment to maintain, at any time, some additional information enabling the efficient extraction of the solution for the specific problem. For all of the aforementioned problems our algorithms yield (α+ε)(\alpha+\varepsilon)-approximations, where α\alpha is the best known approximation attainable in polynomial time in the standard off-line setting (except for kk-center with zz outliers where α=2\alpha = 2 but we get a (3+ε)(3+\varepsilon)-approximation) and ε>0\varepsilon>0 is a user-provided accuracy parameter. The analysis of the algorithms is performed in terms of the doubling dimension of the underlying metric. Remarkably, and unlike previous works, the data structure and the running times of the insertion and deletion procedures do not depend in any way on the accuracy parameter ε\varepsilon and, for the two kk-center variants, on the parameter kk. For spaces of bounded doubling dimension, the running times are dramatically smaller than those that would be required to compute solutions on the entire pointset from scratch. To the best of our knowledge, ours are the first solutions for the matroid-center and diversity maximization problems in the fully dynamic setting

    Iterated Watersheds, A Connected Variation of K-Means for Clustering GIS Data

    Get PDF
    International audienceIn digital age new approaches for effective and efficient governance strategies can be established by exploiting the vast computing and data resources at our disposal. In several cases, the problem of efficient governance translates to finding a solution to an optimization problem. A typical example is where several cases are framed in terms of clustering problem-Given a set of data objects, partition them into clusters such that elements belonging to the same cluster are similar and elements belonging to different clusters are dissimilar. For example, problems such as zonation, river linking, facility allocation and visualizing spatial data can all be framed as clustering problems. However, all these problems come with an additional constraint that the clusters must be connected. In this article, we propose a suitable solution to the clustering problem with a constraint that the clusters must be connected. This is achieved by suitably modifying K-Means algorithm to include connectivity constraints. The modified algorithm involves repeated application of watershed transform, and hence is referred to as iterated watersheds. This algorithm is analyzed in detail using toy examples and the domain of image segmentation due to wide availability of labelled datasets. It has been shown that iterated watersheds perform better than methods such as spectral clustering, isoperimetric partitioning, and K-Means on various measures. To illustrate the applicability of iterated watersheds-a simple problem of placing emergency stations and suitable cost function is considered. Using real world road networks of various cities, iterated watersheds is compared with K-Means and greedy K-center methods. It has been shown that iterated watersheds result in very good improvements over these methods across various experiments

    Experimental Evaluation of Fully Dynamic k-Means via Coresets

    Full text link
    For a set of points in Rd\mathbb{R}^d, the Euclidean kk-means problems consists of finding kk centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools developed recently to solve this problem in a big data context. They allow to compress the initial dataset while preserving its structure: running any algorithm on the coreset provides a guarantee almost equivalent to running it on the full data. In this work, we study coresets in a fully-dynamic setting: points are added and deleted with the goal to efficiently maintain a coreset with which a k-means solution can be computed. Based on an algorithm from Henzinger and Kale [ESA'20], we present an efficient and practical implementation of a fully dynamic coreset algorithm, that improves the running time by up to a factor of 20 compared to our non-optimized implementation of the algorithm by Henzinger and Kale, without sacrificing more than 7% on the quality of the k-means solution.Comment: Accepted at ALENEX 2

    From approximate to exact integer programming

    Full text link
    Approximate integer programming is the following: For a convex body KRnK \subseteq \mathbb{R}^n, either determine whether KZnK \cap \mathbb{Z}^n is empty, or find an integer point in the convex body scaled by 22 from its center of gravity cc. Approximate integer programming can be solved in time 2O(n)2^{O(n)} while the fastest known methods for exact integer programming run in time 2O(n)nn2^{O(n)} \cdot n^n. So far, there are no efficient methods for integer programming known that are based on approximate integer programming. Our main contribution are two such methods, each yielding novel complexity results. First, we show that an integer point x(KZn)x^* \in (K \cap \mathbb{Z}^n) can be found in time 2O(n)2^{O(n)}, provided that the remainders of each component ximodx_i^* \mod{\ell} for some arbitrarily fixed 5(n+1)\ell \geq 5(n+1) of xx^* are given. The algorithm is based on a cutting-plane technique, iteratively halving the volume of the feasible set. The cutting planes are determined via approximate integer programming. Enumeration of the possible remainders gives a 2O(n)nn2^{O(n)}n^n algorithm for general integer programming. This matches the current best bound of an algorithm by Dadush (2012) that is considerably more involved. Our algorithm also relies on a new asymmetric approximate Carath\'eodory theorem that might be of interest on its own. Our second method concerns integer programming problems in equation-standard form Ax=b,0xu,xZnAx = b, 0 \leq x \leq u, \, x \in \mathbb{Z}^n . Such a problem can be reduced to the solution of iO(logui+1)\prod_i O(\log u_i +1) approximate integer programming problems. This implies, for example that knapsack or subset-sum problems with polynomial variable range 0xip(n)0 \leq x_i \leq p(n) can be solved in time (logn)O(n)(\log n)^{O(n)}. For these problems, the best running time so far was nn2O(n)n^n \cdot 2^{O(n)}

    Modeling and Algorithmic Development for Selected Real-World Optimization Problems with Hard-to-Model Features

    Get PDF
    Mathematical optimization is a common tool for numerous real-world optimization problems. However, in some application domains there is a scope for improvement of currently used optimization techniques. For example, this is typically the case for applications that contain features which are difficult to model, and applications of interdisciplinary nature where no strong optimization knowledge is available. The goal of this thesis is to demonstrate how to overcome these challenges by considering five problems from two application domains. The first domain that we address is scheduling in Cloud computing systems, in which we investigate three selected problems. First, we study scheduling problems where jobs are required to start immediately when they are submitted to the system. This requirement is ubiquitous in Cloud computing but has not yet been addressed in mathematical scheduling. Our main contributions are (a) providing the formal model, (b) the development of exact and efficient solution algorithms, and (c) proofs of correctness of the algorithms. Second, we investigate the problem of energy-aware scheduling in Cloud data centers. The objective is to assign computing tasks to machines such that the energy required to operate the data center, i.e., the energy required to operate computing devices plus the energy required to cool computing devices, is minimized. Our main contributions are (a) the mathematical model, and (b) the development of efficient heuristics. Third, we address the problem of evaluating scheduling algorithms in a realistic environment. To this end we develop an approach that supports mathematicians to evaluate scheduling algorithms through simulation with realistic instances. Our main contributions are the development of (a) a formal model, and (b) efficient heuristics. The second application domain considered is powerline routing. We are given two points on a geographic area and respective terrain characteristics. The objective is to find a ``good'' route (which depends on the terrain), connecting both points along which a powerline should be built. Within this application domain, we study two selected problems. First, we study a geometric shortest path problem, an abstract and simplified version of the powerline routing problem. We introduce the concept of the k-neighborhood and contribute various analytical results. Second, we investigate the actual powerline routing problem. To this end, we develop algorithms that are built upon the theoretical insights obtained in the previous study. Our main contributions are (a) the development of exact algorithms and efficient heuristics, and (b) a comprehensive evaluation through two real-world case studies. Some parts of the research presented in this thesis have been published in refereed publications [119], [110], [109]

    Small Space Stream Summary for Matroid Center

    Get PDF
    In the matroid center problem, which generalizes the k-center problem, we need to pick a set of centers that is an independent set of a matroid with rank r. We study this problem in streaming, where elements of the ground set arrive in the stream. We first show that any randomized one-pass streaming algorithm that computes a better than Delta-approximation for partition-matroid center must use Omega(r^2) bits of space, where Delta is the aspect ratio of the metric and can be arbitrarily large. This shows a quadratic separation between matroid center and k-center, for which the Doubling algorithm [Charikar et al., 1997] gives an 8-approximation using O(k)-space and one pass. To complement this, we give a one-pass algorithm for matroid center that stores at most O(r^2 log(1/epsilon)/epsilon) points (viz., stream summary) among which a (7+epsilon)-approximate solution exists, which can be found by brute force, or a (17+epsilon)-approximation can be found with an efficient algorithm. If we are allowed a second pass, we can compute a (3+epsilon)-approximation efficiently. We also consider the problem of matroid center with z outliers and give a one-pass algorithm that outputs a set of O((r^2+rz)log(1/epsilon)/epsilon) points that contains a (15+epsilon)-approximate solution. Our techniques extend to knapsack center and knapsack center with z outliers in a straightforward way, and we get algorithms that use space linear in the size of a largest feasible set (as opposed to quadratic space for matroid center)
    corecore