Search CORE

73,077 research outputs found

Fully Scalable MPC Algorithms for Clustering in High Dimension

Author: Czumaj Artur
Gao Guichen
Jiang Shaofeng H. -C.
Krauthgamer Robert
Veselý Pavel
Publication venue
Publication date: 14/11/2023
Field of study

We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be

n^{\sigma}

for arbitrarily small fixed

\sigma>0

. Importantly, the local memory may be substantially smaller than the number of clusters

k

, yet all our algorithms are fast, i.e., run in

O(1)

rounds. We first devise a fast MPC algorithm for

O(1)

-approximation of uniform facility location. This is the first fully-scalable MPC algorithm that achieves

O(1)

-approximation for any clustering problem in general geometric setting; previous algorithms only provide

\mathrm{poly}(\log n)

-approximation or apply to restricted inputs, like low dimension or small number of clusters

k

; e.g. [Bhaskara and Wijewardena, ICML'18; Cohen-Addad et al., NeurIPS'21; Cohen-Addad et al., ICML'22]. We then build on this facility location result and devise a fast MPC algorithm that achieves

O(1)

-bicriteria approximation for

k

-Median and for

k

-Means, namely, it computes

(1+\varepsilon)k

clusters of cost within

O(1/\varepsilon^2)

-factor of the optimum for

k

clusters. A primary technical tool that we introduce, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing for every data point a statistic of its approximate neighborhood, for statistics like range counting and nearest-neighbor search. Our implementation of this primitive works in high dimension, and is based on consistent hashing (aka sparse partition), a technique that was recently used for streaming algorithms [Czumaj et al., FOCS'22]

arXiv.org e-Print Archive

Fair Rank Aggregation

Author: Chakraborty Diptarka
Das Syamantak
Khan Arindam
Subramanian Aditya
Publication venue
Publication date: 21/08/2023
Field of study

Ranking algorithms find extensive usage in diverse areas such as web search, employment, college admission, voting, etc. The related rank aggregation problem deals with combining multiple rankings into a single aggregate ranking. However, algorithms for both these problems might be biased against some individuals or groups due to implicit prejudice or marginalization in the historical data. We study ranking and rank aggregation problems from a fairness or diversity perspective, where the candidates (to be ranked) may belong to different groups and each group should have a fair representation in the final ranking. We allow the designer to set the parameters that define fair representation. These parameters specify the allowed range of the number of candidates from a particular group in the top-

k

positions of the ranking. Given any ranking, we provide a fast and exact algorithm for finding the closest fair ranking for the Kendall tau metric under block-fairness. We also provide an exact algorithm for finding the closest fair ranking for the Ulam metric under strict-fairness, when there are only

O(1)

number of groups. Our algorithms are simple, fast, and might be extendable to other relevant metrics. We also give a novel meta-algorithm for the general rank aggregation problem under the fairness framework. Surprisingly, this meta-algorithm works for any generalized mean objective (including center and median problems) and any fairness criteria. As a byproduct, we obtain 3-approximation algorithms for both center and median problems, under both Kendall tau and Ulam metrics. Furthermore, using sophisticated techniques we obtain a

(3-\varepsilon)

-approximation algorithm, for a constant

\varepsilon>0

, for the Ulam metric under strong fairness.Comment: A preliminary version of this paper appeared in NeurIPS 202

arXiv.org e-Print Archive

Lotsize optimization leading to a $p$ -median problem with cardinalities

Author: Gaul Constantin
Kurz Sascha
Rambau Joerg
Publication venue
Publication date: 01/01/2007
Field of study

We consider the problem of approximating the branch and size dependent demand of a fashion discounter with many branches by a distributing process being based on the branch delivery restricted to integral multiples of lots from a small set of available lot-types. We propose a formalized model which arises from a practical cooperation with an industry partner. Besides an integer linear programming formulation and a primal heuristic for this problem we also consider a more abstract version which we relate to several other classical optimization problems like the p-median problem, the facility location problem or the matching problem.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

EPub Bayreuth

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Author: Munteanu Alexander
Schwiegelshohn Chris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

Archivio della ricerca- Università di Roma La Sapienza

Fast Clustering with Lower Bounds: No Customer too Far, No Shop too Small

Author: Ene Alina
Har-Peled Sariel
Raichel Benjamin
Publication venue
Publication date: 26/04/2013
Field of study

We study the \LowerBoundedCenter (\lbc) problem, which is a clustering problem that can be viewed as a variant of the \kCenter problem. In the \lbc problem, we are given a set of points P in a metric space and a lower bound \lambda, and the goal is to select a set C \subseteq P of centers and an assignment that maps each point in P to a center of C such that each center of C is assigned at least \lambda points. The price of an assignment is the maximum distance between a point and the center it is assigned to, and the goal is to find a set of centers and an assignment of minimum price. We give a constant factor approximation algorithm for the \lbc problem that runs in O(n \log n) time when the input points lie in the d-dimensional Euclidean space R^d, where d is a constant. We also prove that this problem cannot be approximated within a factor of 1.8-\epsilon unless P = \NP even if the input points are points in the Euclidean plane R^2.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Approximating the least hypervolume contributor: NP-hard in general, but fast in practice

Author: Beume
Beume
Bradstreet
Bringmann
Bringmann
Deb
Karl Bringmann
Klee
Muller
Overmars
Papadimitriou
Press
Roth
Tobias Friedrich
Valiant
While
Zitzler
Zitzler
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

The hypervolume indicator is an increasingly popular set measure to compare the quality of two Pareto sets. The basic ingredient of most hypervolume indicator based optimization algorithms is the calculation of the hypervolume contribution of single solutions regarding a Pareto set. We show that exact calculation of the hypervolume contribution is #P-hard while its approximation is NP-hard. The same holds for the calculation of the minimal contribution. We also prove that it is NP-hard to decide whether a solution has the least hypervolume contribution. Even deciding whether the contribution of a solution is at most (1+\eps) times the minimal contribution is NP-hard. This implies that it is neither possible to efficiently find the least contributing solution (unless

P = NP

) nor to approximate it (unless

NP = BPP

). Nevertheless, in the second part of the paper we present a fast approximation algorithm for this problem. We prove that for arbitrarily given \eps,\delta>0 it calculates a solution with contribution at most (1+\eps) times the minimal contribution with probability at least

(1-\delta)

. Though it cannot run in polynomial time for all instances, it performs extremely fast on various benchmark datasets. The algorithm solves very large problem instances which are intractable for exact algorithms (e.g., 10000 solutions in 100 dimensions) within a few seconds.Comment: 22 pages, to appear in Theoretical Computer Scienc

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

MPG.PuRe