Search CORE

2,663 research outputs found

Diversity Maximization in Doubling Metrics

Author: Cevallos Alfonso
Eisenbrand Friedrich
Morell Sarah
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th International Symposium on Algorithms and Computation (ISAAC 2018)
Publication date: 01/01/2018
Field of study

Diversity maximization is an important geometric optimization problem with many applications in recommender systems, machine learning or search engines among others. A typical diversification problem is as follows: Given a finite metric space (X,d) and a parameter k in N, find a subset of k elements of X that has maximum diversity. There are many functions that measure diversity. One of the most popular measures, called remote-clique, is the sum of the pairwise distances of the chosen elements. In this paper, we present novel results on three widely used diversity measures: Remote-clique, remote-star and remote-bipartition. Our main result are polynomial time approximation schemes for these three diversification problems under the assumption that the metric space is doubling. This setting has been discussed in the recent literature. The existence of such a PTAS however was left open. Our results also hold in the setting where the distances are raised to a fixed power q >= 1, giving rise to more variants of diversity functions, similar in spirit to the variations of clustering problems depending on the power applied to the pairwise distances. Finally, we provide a proof of NP-hardness for remote-clique with squared distances in doubling metric spaces

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Fully dynamic clustering and diversity maximization in doubling metrics

Author: Pellizzoni Paolo
Pietracaprina Andrea
Pucci Geppino
Publication venue
Publication date: 01/01/2023
Field of study

We present approximation algorithms for some variants of center-based clustering and related problems in the fully dynamic setting, where the pointset evolves through an arbitrary sequence of insertions and deletions. Specifically, we target the following problems:

k

-center (with and without outliers), matroid-center, and diversity maximization. All algorithms employ a coreset-based strategy and rely on the use of the cover tree data structure, which we crucially augment to maintain, at any time, some additional information enabling the efficient extraction of the solution for the specific problem. For all of the aforementioned problems our algorithms yield

(\alpha+\varepsilon)

-approximations, where

\alpha

is the best known approximation attainable in polynomial time in the standard off-line setting (except for

k

-center with

z

outliers where

\alpha = 2

but we get a

(3+\varepsilon)

-approximation) and

\varepsilon>0

is a user-provided accuracy parameter. The analysis of the algorithms is performed in terms of the doubling dimension of the underlying metric. Remarkably, and unlike previous works, the data structure and the running times of the insertion and deletion procedures do not depend in any way on the accuracy parameter

\varepsilon

and, for the two

k

-center variants, on the parameter

k

. For spaces of bounded doubling dimension, the running times are dramatically smaller than those that would be required to compute solutions on the entire pointset from scratch. To the best of our knowledge, ours are the first solutions for the matroid-center and diversity maximization problems in the fully dynamic setting

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

Author: Ceccarello Matteo
Pietracaprina Andrea
Pucci Geppino
Upfal Eli
Publication venue
Publication date: 01/01/2017
Field of study

Given a dataset of points in a metric space and an integer

k

, a diversity maximization problem requires determining a subset of

k

points maximizing some diversity objective measure, e.g., the minimum or the average distance between two points in the subset. Diversity maximization is computationally hard, hence only approximate solutions can be hoped for. Although its applications are mainly in massive data analysis, most of the past research on diversity maximization focused on the sequential setting. In this work we present space and pass/round-efficient diversity maximization algorithms for the Streaming and MapReduce models and analyze their approximation guarantees for the relevant class of metric spaces of bounded doubling dimension. Like other approaches in the literature, our algorithms rely on the determination of high-quality core-sets, i.e., (much) smaller subsets of the input which contain good approximations to the optimal solution for the whole input. For a variety of diversity objective functions, our algorithms attain an

(\alpha+\epsilon)

-approximation ratio, for any constant

\epsilon>0

, where

\alpha

is the best approximation ratio achieved by a polynomial-time, linear-space sequential algorithm for the same diversity objective. This improves substantially over the approximation ratios attainable in Streaming and MapReduce by state-of-the-art algorithms for general metric spaces. We provide extensive experimental evidence of the effectiveness of our algorithms on both real world and synthetic datasets, scaling up to over a billion points.Comment: Extended version of http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5, January 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

Improved Approximation and Scalability for Fair Max-Min Diversification

Author: Addanki Raghavendra
McGregor Andrew
Meliou Alexandra
Moumoulidou Zafeiria
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th International Conference on Database Theory (ICDT 2022)
Publication date: 01/01/2022
Field of study

Given an

n

-point metric space

(\mathcal{X},d)

where each point belongs to one of

m=O(1)

different categories or groups and a set of integers

k_1, \ldots, k_m

, the fair Max-Min diversification problem is to select

k_i

points belonging to category

i\in [m]

, such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1. We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor

2

-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a

6

-approximation that is guaranteed to satisfy the fairness constraints up to a factor

1-\epsilon

for any constant

\epsilon

. We also present a linear time algorithm returning an

m+1

approximation with exact fairness. The best previous result was a

3m-1

approximation. 2. We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. For constant dimensions, categories and any constant

\epsilon>0

, we present a

1+\epsilon

approximation algorithm that runs in

O(nk) + 2^{O(k)}

time where

k=k_1+\ldots+k_m

. We can improve the running time to

O(nk)+ poly(k)

at the expense of only picking

(1-\epsilon) k_i

points from category

i\in [m]

. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.Comment: To appear in ICDT 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Wireless Scheduling with Power Control

Author: Halldorsson Magnus M.
Publication venue
Publication date: 01/01/2011
Field of study

We consider the scheduling of arbitrary wireless links in the physical model of interference to minimize the time for satisfying all requests. We study here the combined problem of scheduling and power control, where we seek both an assignment of power settings and a partition of the links so that each set satisfies the signal-to-interference-plus-noise (SINR) constraints. We give an algorithm that attains an approximation ratio of

O(\log n \cdot \log\log \Delta)

, where

n

is the number of links and

\Delta

is the ratio between the longest and the shortest link length. Under the natural assumption that lengths are represented in binary, this gives the first approximation ratio that is polylogarithmic in the size of the input. The algorithm has the desirable property of using an oblivious power assignment, where the power assigned to a sender depends only on the length of the link. We give evidence that this dependence on

\Delta

is unavoidable, showing that any reasonably-behaving oblivious power assignment results in a

\Omega(\log\log \Delta)

-approximation. These results hold also for the (weighted) capacity problem of finding a maximum (weighted) subset of links that can be scheduled in a single time slot. In addition, we obtain improved approximation for a bidirectional variant of the scheduling problem, give partial answers to questions about the utility of graphs for modeling physical interference, and generalize the setting from the standard 2-dimensional Euclidean plane to doubling metrics. Finally, we explore the utility of graph models in capturing wireless interference.Comment: Revised full versio

arXiv.org e-Print Archive

CiteSeerX

Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

Author: Ding Hu
Wang Zixiu
Yu Haikuo
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

We study the problem of k-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez\u27s algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center clustering with outliers efficiently, in terms of clustering quality and time complexity. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Our algorithms are easy to implement in practice. We test our method on both synthetic and real datasets. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower running times comparing with existing methods

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server