722,894 research outputs found
Fully dynamic clustering and diversity maximization in doubling metrics
We present approximation algorithms for some variants of center-based
clustering and related problems in the fully dynamic setting, where the
pointset evolves through an arbitrary sequence of insertions and deletions.
Specifically, we target the following problems: -center (with and without
outliers), matroid-center, and diversity maximization. All algorithms employ a
coreset-based strategy and rely on the use of the cover tree data structure,
which we crucially augment to maintain, at any time, some additional
information enabling the efficient extraction of the solution for the specific
problem. For all of the aforementioned problems our algorithms yield
-approximations, where is the best known
approximation attainable in polynomial time in the standard off-line setting
(except for -center with outliers where but we get a
-approximation) and is a user-provided
accuracy parameter. The analysis of the algorithms is performed in terms of the
doubling dimension of the underlying metric. Remarkably, and unlike previous
works, the data structure and the running times of the insertion and deletion
procedures do not depend in any way on the accuracy parameter
and, for the two -center variants, on the parameter . For spaces of
bounded doubling dimension, the running times are dramatically smaller than
those that would be required to compute solutions on the entire pointset from
scratch. To the best of our knowledge, ours are the first solutions for the
matroid-center and diversity maximization problems in the fully dynamic
setting
Iterated Watersheds, A Connected Variation of K-Means for Clustering GIS Data
International audienceIn digital age new approaches for effective and efficient governance strategies can be established by exploiting the vast computing and data resources at our disposal. In several cases, the problem of efficient governance translates to finding a solution to an optimization problem. A typical example is where several cases are framed in terms of clustering problem-Given a set of data objects, partition them into clusters such that elements belonging to the same cluster are similar and elements belonging to different clusters are dissimilar. For example, problems such as zonation, river linking, facility allocation and visualizing spatial data can all be framed as clustering problems. However, all these problems come with an additional constraint that the clusters must be connected. In this article, we propose a suitable solution to the clustering problem with a constraint that the clusters must be connected. This is achieved by suitably modifying K-Means algorithm to include connectivity constraints. The modified algorithm involves repeated application of watershed transform, and hence is referred to as iterated watersheds. This algorithm is analyzed in detail using toy examples and the domain of image segmentation due to wide availability of labelled datasets. It has been shown that iterated watersheds perform better than methods such as spectral clustering, isoperimetric partitioning, and K-Means on various measures. To illustrate the applicability of iterated watersheds-a simple problem of placing emergency stations and suitable cost function is considered. Using real world road networks of various cities, iterated watersheds is compared with K-Means and greedy K-center methods. It has been shown that iterated watersheds result in very good improvements over these methods across various experiments
Experimental Evaluation of Fully Dynamic k-Means via Coresets
For a set of points in , the Euclidean -means problems
consists of finding centers such that the sum of distances squared from
each data point to its closest center is minimized. Coresets are one the main
tools developed recently to solve this problem in a big data context. They
allow to compress the initial dataset while preserving its structure: running
any algorithm on the coreset provides a guarantee almost equivalent to running
it on the full data.
In this work, we study coresets in a fully-dynamic setting: points are added
and deleted with the goal to efficiently maintain a coreset with which a
k-means solution can be computed. Based on an algorithm from Henzinger and Kale
[ESA'20], we present an efficient and practical implementation of a fully
dynamic coreset algorithm, that improves the running time by up to a factor of
20 compared to our non-optimized implementation of the algorithm by Henzinger
and Kale, without sacrificing more than 7% on the quality of the k-means
solution.Comment: Accepted at ALENEX 2
From approximate to exact integer programming
Approximate integer programming is the following: For a convex body , either determine whether is
empty, or find an integer point in the convex body scaled by from its
center of gravity . Approximate integer programming can be solved in time
while the fastest known methods for exact integer programming run in
time . So far, there are no efficient methods for integer
programming known that are based on approximate integer programming. Our main
contribution are two such methods, each yielding novel complexity results.
First, we show that an integer point can be
found in time , provided that the remainders of each component for some arbitrarily fixed of are given.
The algorithm is based on a cutting-plane technique, iteratively halving the
volume of the feasible set. The cutting planes are determined via approximate
integer programming. Enumeration of the possible remainders gives a
algorithm for general integer programming. This matches the
current best bound of an algorithm by Dadush (2012) that is considerably more
involved. Our algorithm also relies on a new asymmetric approximate
Carath\'eodory theorem that might be of interest on its own.
Our second method concerns integer programming problems in equation-standard
form . Such a problem can be
reduced to the solution of approximate integer
programming problems. This implies, for example that knapsack or subset-sum
problems with polynomial variable range can be solved in
time . For these problems, the best running time so far was
Modeling and Algorithmic Development for Selected Real-World Optimization Problems with Hard-to-Model Features
Mathematical optimization is a common tool for numerous real-world optimization problems.
However, in some application domains there is a scope for improvement of currently used optimization techniques.
For example, this is typically the case for applications that contain features which are difficult to model, and applications of interdisciplinary nature where no strong optimization knowledge is available.
The goal of this thesis is to demonstrate how to overcome these challenges by considering five problems from two application domains.
The first domain that we address is scheduling in Cloud computing systems, in which we investigate three selected problems.
First, we study scheduling problems where jobs are required to start immediately when they are submitted to the system.
This requirement is ubiquitous in Cloud computing but has not yet been addressed in mathematical scheduling.
Our main contributions are (a) providing the formal model, (b) the development of exact and efficient solution algorithms, and (c) proofs of correctness of the algorithms.
Second, we investigate the problem of energy-aware scheduling in Cloud data centers.
The objective is to assign computing tasks to machines such that the energy required to operate the data center, i.e., the energy required to operate computing devices plus the energy required to cool computing devices, is minimized.
Our main contributions are (a) the mathematical model, and (b) the development of efficient heuristics.
Third, we address the problem of evaluating scheduling algorithms in a realistic environment.
To this end we develop an approach that supports mathematicians to evaluate scheduling algorithms through simulation with realistic instances.
Our main contributions are the development of (a) a formal model, and (b) efficient heuristics.
The second application domain considered is powerline routing.
We are given two points on a geographic area and respective terrain characteristics.
The objective is to find a ``good'' route (which depends on the terrain), connecting both points along which a powerline should be built.
Within this application domain, we study two selected problems.
First, we study a geometric shortest path problem, an abstract and simplified version of the powerline routing problem.
We introduce the concept of the k-neighborhood and contribute various analytical results.
Second, we investigate the actual powerline routing problem.
To this end, we develop algorithms that are built upon the theoretical insights obtained in the previous study.
Our main contributions are (a) the development of exact algorithms and efficient heuristics, and (b) a comprehensive evaluation through two real-world case studies.
Some parts of the research presented in this thesis have been published in refereed publications [119], [110], [109]
Small Space Stream Summary for Matroid Center
In the matroid center problem, which generalizes the k-center problem, we need to pick a set of centers that is an independent set of a matroid with rank r. We study this problem in streaming, where elements of the ground set arrive in the stream. We first show that any randomized one-pass streaming algorithm that computes a better than Delta-approximation for partition-matroid center must use Omega(r^2) bits of space, where Delta is the aspect ratio of the metric and can be arbitrarily large. This shows a quadratic separation between matroid center and k-center, for which the Doubling algorithm [Charikar et al., 1997] gives an 8-approximation using O(k)-space and one pass. To complement this, we give a one-pass algorithm for matroid center that stores at most O(r^2 log(1/epsilon)/epsilon) points (viz., stream summary) among which a (7+epsilon)-approximate solution exists, which can be found by brute force, or a (17+epsilon)-approximation can be found with an efficient algorithm. If we are allowed a second pass, we can compute a (3+epsilon)-approximation efficiently.
We also consider the problem of matroid center with z outliers and give a one-pass algorithm that outputs a set of O((r^2+rz)log(1/epsilon)/epsilon) points that contains a (15+epsilon)-approximate solution. Our techniques extend to knapsack center and knapsack center with z outliers in a straightforward way, and we get algorithms that use space linear in the size of a largest feasible set (as opposed to quadratic space for matroid center)
- …