Search CORE

722,894 research outputs found

Fully dynamic clustering and diversity maximization in doubling metrics

Author: Pellizzoni Paolo
Pietracaprina Andrea
Pucci Geppino
Publication venue
Publication date: 01/01/2023
Field of study

We present approximation algorithms for some variants of center-based clustering and related problems in the fully dynamic setting, where the pointset evolves through an arbitrary sequence of insertions and deletions. Specifically, we target the following problems:

k

-center (with and without outliers), matroid-center, and diversity maximization. All algorithms employ a coreset-based strategy and rely on the use of the cover tree data structure, which we crucially augment to maintain, at any time, some additional information enabling the efficient extraction of the solution for the specific problem. For all of the aforementioned problems our algorithms yield

(\alpha+\varepsilon)

-approximations, where

\alpha

is the best known approximation attainable in polynomial time in the standard off-line setting (except for

k

-center with

z

outliers where

\alpha = 2

but we get a

(3+\varepsilon)

-approximation) and

\varepsilon>0

is a user-provided accuracy parameter. The analysis of the algorithms is performed in terms of the doubling dimension of the underlying metric. Remarkably, and unlike previous works, the data structure and the running times of the insertion and deletion procedures do not depend in any way on the accuracy parameter

\varepsilon

and, for the two

k

-center variants, on the parameter

k

. For spaces of bounded doubling dimension, the running times are dramatically smaller than those that would be required to compute solutions on the entire pointset from scratch. To the best of our knowledge, ours are the first solutions for the matroid-center and diversity maximization problems in the fully dynamic setting

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Iterated Watersheds, A Connected Variation of K-Means for Clustering GIS Data

Author: Challa Aditya
Danda Sravan
Daya Sagar B,
Najman Laurent
Soor Sampriti
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

International audienceIn digital age new approaches for effective and efficient governance strategies can be established by exploiting the vast computing and data resources at our disposal. In several cases, the problem of efficient governance translates to finding a solution to an optimization problem. A typical example is where several cases are framed in terms of clustering problem-Given a set of data objects, partition them into clusters such that elements belonging to the same cluster are similar and elements belonging to different clusters are dissimilar. For example, problems such as zonation, river linking, facility allocation and visualizing spatial data can all be framed as clustering problems. However, all these problems come with an additional constraint that the clusters must be connected. In this article, we propose a suitable solution to the clustering problem with a constraint that the clusters must be connected. This is achieved by suitably modifying K-Means algorithm to include connectivity constraints. The modified algorithm involves repeated application of watershed transform, and hence is referred to as iterated watersheds. This algorithm is analyzed in detail using toy examples and the domain of image segmentation due to wide availability of labelled datasets. It has been shown that iterated watersheds perform better than methods such as spectral clustering, isoperimetric partitioning, and K-Means on various measures. To illustrate the applicability of iterated watersheds-a simple problem of placing emergency stations and suitable cost function is considered. Using real world road networks of various cities, iterated watersheds is compared with K-Means and greedy K-center methods. It has been shown that iterated watersheds result in very good improvements over these methods across various experiments

HAL Descartes

Experimental Evaluation of Fully Dynamic k-Means via Coresets

Author: Henzinger Monika
Saulpic David
Sidl Leonhard
Publication venue
Publication date: 27/10/2023
Field of study

For a set of points in

\mathbb{R}^d

, the Euclidean

k

-means problems consists of finding

k

centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools developed recently to solve this problem in a big data context. They allow to compress the initial dataset while preserving its structure: running any algorithm on the coreset provides a guarantee almost equivalent to running it on the full data. In this work, we study coresets in a fully-dynamic setting: points are added and deleted with the goal to efficiently maintain a coreset with which a k-means solution can be computed. Based on an algorithm from Henzinger and Kale [ESA'20], we present an efficient and practical implementation of a fully dynamic coreset algorithm, that improves the running time by up to a factor of 20 compared to our non-optimized implementation of the algorithm by Henzinger and Kale, without sacrificing more than 7% on the quality of the k-means solution.Comment: Accepted at ALENEX 2

arXiv.org e-Print Archive

From approximate to exact integer programming

Author: Dadush Daniel
Eisenbrand Friedrich
Rothvoss Thomas
Publication venue
Publication date: 07/11/2022
Field of study

Approximate integer programming is the following: For a convex body

K \subseteq \mathbb{R}^n

, either determine whether

K \cap \mathbb{Z}^n

is empty, or find an integer point in the convex body scaled by

2

from its center of gravity

c

. Approximate integer programming can be solved in time

2^{O(n)}

while the fastest known methods for exact integer programming run in time

2^{O(n)} \cdot n^n

. So far, there are no efficient methods for integer programming known that are based on approximate integer programming. Our main contribution are two such methods, each yielding novel complexity results. First, we show that an integer point

x^* \in (K \cap \mathbb{Z}^n)

can be found in time

2^{O(n)}

, provided that the remainders of each component

x_i^* \mod{\ell}

for some arbitrarily fixed

\ell \geq 5(n+1)

x^*

are given. The algorithm is based on a cutting-plane technique, iteratively halving the volume of the feasible set. The cutting planes are determined via approximate integer programming. Enumeration of the possible remainders gives a

2^{O(n)}n^n

algorithm for general integer programming. This matches the current best bound of an algorithm by Dadush (2012) that is considerably more involved. Our algorithm also relies on a new asymmetric approximate Carath\'eodory theorem that might be of interest on its own. Our second method concerns integer programming problems in equation-standard form

Ax = b, 0 \leq x \leq u, \, x \in \mathbb{Z}^n

. Such a problem can be reduced to the solution of

\prod_i O(\log u_i +1)

approximate integer programming problems. This implies, for example that knapsack or subset-sum problems with polynomial variable range

0 \leq x_i \leq p(n)

can be solved in time

(\log n)^{O(n)}

. For these problems, the best running time so far was

n^n \cdot 2^{O(n)}

arXiv.org e-Print Archive

CWI's Institutional Repository

Modeling and Algorithmic Development for Selected Real-World Optimization Problems with Hard-to-Model Features

Author: Primas Bernhard Josef
Publication venue: University of Leeds
Publication date: 01/01/2019
Field of study

Mathematical optimization is a common tool for numerous real-world optimization problems. However, in some application domains there is a scope for improvement of currently used optimization techniques. For example, this is typically the case for applications that contain features which are difficult to model, and applications of interdisciplinary nature where no strong optimization knowledge is available. The goal of this thesis is to demonstrate how to overcome these challenges by considering five problems from two application domains. The first domain that we address is scheduling in Cloud computing systems, in which we investigate three selected problems. First, we study scheduling problems where jobs are required to start immediately when they are submitted to the system. This requirement is ubiquitous in Cloud computing but has not yet been addressed in mathematical scheduling. Our main contributions are (a) providing the formal model, (b) the development of exact and efficient solution algorithms, and (c) proofs of correctness of the algorithms. Second, we investigate the problem of energy-aware scheduling in Cloud data centers. The objective is to assign computing tasks to machines such that the energy required to operate the data center, i.e., the energy required to operate computing devices plus the energy required to cool computing devices, is minimized. Our main contributions are (a) the mathematical model, and (b) the development of efficient heuristics. Third, we address the problem of evaluating scheduling algorithms in a realistic environment. To this end we develop an approach that supports mathematicians to evaluate scheduling algorithms through simulation with realistic instances. Our main contributions are the development of (a) a formal model, and (b) efficient heuristics. The second application domain considered is powerline routing. We are given two points on a geographic area and respective terrain characteristics. The objective is to find a ``good'' route (which depends on the terrain), connecting both points along which a powerline should be built. Within this application domain, we study two selected problems. First, we study a geometric shortest path problem, an abstract and simplified version of the powerline routing problem. We introduce the concept of the k-neighborhood and contribute various analytical results. Second, we investigate the actual powerline routing problem. To this end, we develop algorithms that are built upon the theoretical insights obtained in the previous study. Our main contributions are (a) the development of exact algorithms and efficient heuristics, and (b) a comprehensive evaluation through two real-world case studies. Some parts of the research presented in this thesis have been published in refereed publications [119], [110], [109]

White Rose E-theses Online

Small Space Stream Summary for Matroid Center

Author: Kale Sagar
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

In the matroid center problem, which generalizes the k-center problem, we need to pick a set of centers that is an independent set of a matroid with rank r. We study this problem in streaming, where elements of the ground set arrive in the stream. We first show that any randomized one-pass streaming algorithm that computes a better than Delta-approximation for partition-matroid center must use Omega(r^2) bits of space, where Delta is the aspect ratio of the metric and can be arbitrarily large. This shows a quadratic separation between matroid center and k-center, for which the Doubling algorithm [Charikar et al., 1997] gives an 8-approximation using O(k)-space and one pass. To complement this, we give a one-pass algorithm for matroid center that stores at most O(r^2 log(1/epsilon)/epsilon) points (viz., stream summary) among which a (7+epsilon)-approximate solution exists, which can be found by brute force, or a (17+epsilon)-approximation can be found with an efficient algorithm. If we are allowed a second pass, we can compute a (3+epsilon)-approximation efficiently. We also consider the problem of matroid center with z outliers and give a one-pass algorithm that outputs a set of O((r^2+rz)log(1/epsilon)/epsilon) points that contains a (15+epsilon)-approximate solution. Our techniques extend to knapsack center and knapsack center with z outliers in a straightforward way, and we get algorithms that use space linear in the size of a largest feasible set (as opposed to quadratic space for matroid center)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server