73,077 research outputs found
Fully Scalable MPC Algorithms for Clustering in High Dimension
We design new parallel algorithms for clustering in high-dimensional
Euclidean spaces. These algorithms run in the Massively Parallel Computation
(MPC) model, and are fully scalable, meaning that the local memory in each
machine may be for arbitrarily small fixed .
Importantly, the local memory may be substantially smaller than the number of
clusters , yet all our algorithms are fast, i.e., run in rounds.
We first devise a fast MPC algorithm for -approximation of uniform
facility location. This is the first fully-scalable MPC algorithm that achieves
-approximation for any clustering problem in general geometric setting;
previous algorithms only provide -approximation or apply
to restricted inputs, like low dimension or small number of clusters ; e.g.
[Bhaskara and Wijewardena, ICML'18; Cohen-Addad et al., NeurIPS'21; Cohen-Addad
et al., ICML'22]. We then build on this facility location result and devise a
fast MPC algorithm that achieves -bicriteria approximation for -Median
and for -Means, namely, it computes clusters of cost
within -factor of the optimum for clusters.
A primary technical tool that we introduce, and may be of independent
interest, is a new MPC primitive for geometric aggregation, namely, computing
for every data point a statistic of its approximate neighborhood, for
statistics like range counting and nearest-neighbor search. Our implementation
of this primitive works in high dimension, and is based on consistent hashing
(aka sparse partition), a technique that was recently used for streaming
algorithms [Czumaj et al., FOCS'22]
Fair Rank Aggregation
Ranking algorithms find extensive usage in diverse areas such as web search,
employment, college admission, voting, etc. The related rank aggregation
problem deals with combining multiple rankings into a single aggregate ranking.
However, algorithms for both these problems might be biased against some
individuals or groups due to implicit prejudice or marginalization in the
historical data. We study ranking and rank aggregation problems from a fairness
or diversity perspective, where the candidates (to be ranked) may belong to
different groups and each group should have a fair representation in the final
ranking. We allow the designer to set the parameters that define fair
representation. These parameters specify the allowed range of the number of
candidates from a particular group in the top- positions of the ranking.
Given any ranking, we provide a fast and exact algorithm for finding the
closest fair ranking for the Kendall tau metric under block-fairness. We also
provide an exact algorithm for finding the closest fair ranking for the Ulam
metric under strict-fairness, when there are only number of groups. Our
algorithms are simple, fast, and might be extendable to other relevant metrics.
We also give a novel meta-algorithm for the general rank aggregation problem
under the fairness framework. Surprisingly, this meta-algorithm works for any
generalized mean objective (including center and median problems) and any
fairness criteria. As a byproduct, we obtain 3-approximation algorithms for
both center and median problems, under both Kendall tau and Ulam metrics.
Furthermore, using sophisticated techniques we obtain a
-approximation algorithm, for a constant , for
the Ulam metric under strong fairness.Comment: A preliminary version of this paper appeared in NeurIPS 202
Lotsize optimization leading to a -median problem with cardinalities
We consider the problem of approximating the branch and size dependent demand
of a fashion discounter with many branches by a distributing process being
based on the branch delivery restricted to integral multiples of lots from a
small set of available lot-types. We propose a formalized model which arises
from a practical cooperation with an industry partner. Besides an integer
linear programming formulation and a primal heuristic for this problem we also
consider a more abstract version which we relate to several other classical
optimization problems like the p-median problem, the facility location problem
or the matching problem.Comment: 14 page
Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms
We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
Fast Clustering with Lower Bounds: No Customer too Far, No Shop too Small
We study the \LowerBoundedCenter (\lbc) problem, which is a clustering
problem that can be viewed as a variant of the \kCenter problem. In the \lbc
problem, we are given a set of points P in a metric space and a lower bound
\lambda, and the goal is to select a set C \subseteq P of centers and an
assignment that maps each point in P to a center of C such that each center of
C is assigned at least \lambda points. The price of an assignment is the
maximum distance between a point and the center it is assigned to, and the goal
is to find a set of centers and an assignment of minimum price. We give a
constant factor approximation algorithm for the \lbc problem that runs in O(n
\log n) time when the input points lie in the d-dimensional Euclidean space
R^d, where d is a constant. We also prove that this problem cannot be
approximated within a factor of 1.8-\epsilon unless P = \NP even if the input
points are points in the Euclidean plane R^2.Comment: 14 page
Approximating the least hypervolume contributor: NP-hard in general, but fast in practice
The hypervolume indicator is an increasingly popular set measure to compare
the quality of two Pareto sets. The basic ingredient of most hypervolume
indicator based optimization algorithms is the calculation of the hypervolume
contribution of single solutions regarding a Pareto set. We show that exact
calculation of the hypervolume contribution is #P-hard while its approximation
is NP-hard. The same holds for the calculation of the minimal contribution. We
also prove that it is NP-hard to decide whether a solution has the least
hypervolume contribution. Even deciding whether the contribution of a solution
is at most (1+\eps) times the minimal contribution is NP-hard. This implies
that it is neither possible to efficiently find the least contributing solution
(unless ) nor to approximate it (unless ).
Nevertheless, in the second part of the paper we present a fast approximation
algorithm for this problem. We prove that for arbitrarily given \eps,\delta>0
it calculates a solution with contribution at most (1+\eps) times the minimal
contribution with probability at least . Though it cannot run in
polynomial time for all instances, it performs extremely fast on various
benchmark datasets. The algorithm solves very large problem instances which are
intractable for exact algorithms (e.g., 10000 solutions in 100 dimensions)
within a few seconds.Comment: 22 pages, to appear in Theoretical Computer Scienc
- …