2,663 research outputs found
Diversity Maximization in Doubling Metrics
Diversity maximization is an important geometric optimization problem with many applications in recommender systems, machine learning or search engines among others. A typical diversification problem is as follows: Given a finite metric space (X,d) and a parameter k in N, find a subset of k elements of X that has maximum diversity. There are many functions that measure diversity. One of the most popular measures, called remote-clique, is the sum of the pairwise distances of the chosen elements. In this paper, we present novel results on three widely used diversity measures: Remote-clique, remote-star and remote-bipartition.
Our main result are polynomial time approximation schemes for these three diversification problems under the assumption that the metric space is doubling. This setting has been discussed in the recent literature. The existence of such a PTAS however was left open.
Our results also hold in the setting where the distances are raised to a fixed power q >= 1, giving rise to more variants of diversity functions, similar in spirit to the variations of clustering problems depending on the power applied to the pairwise distances. Finally, we provide a proof of NP-hardness for remote-clique with squared distances in doubling metric spaces
Fully dynamic clustering and diversity maximization in doubling metrics
We present approximation algorithms for some variants of center-based
clustering and related problems in the fully dynamic setting, where the
pointset evolves through an arbitrary sequence of insertions and deletions.
Specifically, we target the following problems: -center (with and without
outliers), matroid-center, and diversity maximization. All algorithms employ a
coreset-based strategy and rely on the use of the cover tree data structure,
which we crucially augment to maintain, at any time, some additional
information enabling the efficient extraction of the solution for the specific
problem. For all of the aforementioned problems our algorithms yield
-approximations, where is the best known
approximation attainable in polynomial time in the standard off-line setting
(except for -center with outliers where but we get a
-approximation) and is a user-provided
accuracy parameter. The analysis of the algorithms is performed in terms of the
doubling dimension of the underlying metric. Remarkably, and unlike previous
works, the data structure and the running times of the insertion and deletion
procedures do not depend in any way on the accuracy parameter
and, for the two -center variants, on the parameter . For spaces of
bounded doubling dimension, the running times are dramatically smaller than
those that would be required to compute solutions on the entire pointset from
scratch. To the best of our knowledge, ours are the first solutions for the
matroid-center and diversity maximization problems in the fully dynamic
setting
MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension
Given a dataset of points in a metric space and an integer , a diversity
maximization problem requires determining a subset of points maximizing
some diversity objective measure, e.g., the minimum or the average distance
between two points in the subset. Diversity maximization is computationally
hard, hence only approximate solutions can be hoped for. Although its
applications are mainly in massive data analysis, most of the past research on
diversity maximization focused on the sequential setting. In this work we
present space and pass/round-efficient diversity maximization algorithms for
the Streaming and MapReduce models and analyze their approximation guarantees
for the relevant class of metric spaces of bounded doubling dimension. Like
other approaches in the literature, our algorithms rely on the determination of
high-quality core-sets, i.e., (much) smaller subsets of the input which contain
good approximations to the optimal solution for the whole input. For a variety
of diversity objective functions, our algorithms attain an
-approximation ratio, for any constant , where
is the best approximation ratio achieved by a polynomial-time,
linear-space sequential algorithm for the same diversity objective. This
improves substantially over the approximation ratios attainable in Streaming
and MapReduce by state-of-the-art algorithms for general metric spaces. We
provide extensive experimental evidence of the effectiveness of our algorithms
on both real world and synthetic datasets, scaling up to over a billion points.Comment: Extended version of
http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5,
January 201
Improved Approximation and Scalability for Fair Max-Min Diversification
Given an -point metric space where each point belongs to
one of different categories or groups and a set of integers , the fair Max-Min diversification problem is to select
points belonging to category , such that the minimum pairwise
distance between selected points is maximized. The problem was introduced by
Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample
large data sets in various applications so that the derived sample achieves a
balance over diversity, i.e., the minimum distance between a pair of selected
points, and fairness, i.e., ensuring enough points of each category are
included. We prove the following results:
1. We first consider general metric spaces. We present a randomized
polynomial time algorithm that returns a factor -approximation to the
diversity but only satisfies the fairness constraints in expectation. Building
upon this result, we present a -approximation that is guaranteed to satisfy
the fairness constraints up to a factor for any constant
. We also present a linear time algorithm returning an
approximation with exact fairness. The best previous result was a
approximation.
2. We then focus on Euclidean metrics. We first show that the problem can be
solved exactly in one dimension. For constant dimensions, categories and any
constant , we present a approximation algorithm that
runs in time where . We can improve the
running time to at the expense of only picking points from category .
Finally, we present algorithms suitable to processing massive data sets
including single-pass data stream algorithms and composable coresets for the
distributed processing.Comment: To appear in ICDT 202
Wireless Scheduling with Power Control
We consider the scheduling of arbitrary wireless links in the physical model
of interference to minimize the time for satisfying all requests. We study here
the combined problem of scheduling and power control, where we seek both an
assignment of power settings and a partition of the links so that each set
satisfies the signal-to-interference-plus-noise (SINR) constraints.
We give an algorithm that attains an approximation ratio of , where is the number of links and is the ratio
between the longest and the shortest link length. Under the natural assumption
that lengths are represented in binary, this gives the first approximation
ratio that is polylogarithmic in the size of the input. The algorithm has the
desirable property of using an oblivious power assignment, where the power
assigned to a sender depends only on the length of the link. We give evidence
that this dependence on is unavoidable, showing that any
reasonably-behaving oblivious power assignment results in a -approximation.
These results hold also for the (weighted) capacity problem of finding a
maximum (weighted) subset of links that can be scheduled in a single time slot.
In addition, we obtain improved approximation for a bidirectional variant of
the scheduling problem, give partial answers to questions about the utility of
graphs for modeling physical interference, and generalize the setting from the
standard 2-dimensional Euclidean plane to doubling metrics. Finally, we explore
the utility of graph models in capturing wireless interference.Comment: Revised full versio
Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction
We study the problem of k-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez\u27s algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center clustering with outliers efficiently, in terms of clustering quality and time complexity. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Our algorithms are easy to implement in practice. We test our method on both synthetic and real datasets. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower running times comparing with existing methods
- …