148,462 research outputs found
Estimating the geometric median in Hilbert spaces with stochastic gradient algorithms: and almost sure rates of convergence
The geometric median, also called -median, is often used in robust
statistics. Moreover, it is more and more usual to deal with large samples
taking values in high dimensional spaces. In this context, a fast recursive
estimator has been introduced by Cardot, Cenac and Zitt. This work aims at
studying more precisely the asymptotic behavior of the estimators of the
geometric median based on such non linear stochastic gradient algorithms. The
rates of convergence as well as almost sure rates of convergence of
these estimators are derived in general separable Hilbert spaces. Moreover, the
optimal rate of convergence in quadratic mean of the averaged algorithm is also
given
Groups acting on quasi-median graphs. An introduction
Quasi-median graphs have been introduced by Mulder in 1980 as a
generalisation of median graphs, known in geometric group theory to naturally
coincide with the class of CAT(0) cube complexes. In his PhD thesis, the author
showed that quasi-median graphs may be useful to study groups as well. In the
present paper, we propose a gentle introduction to the theory of groups acting
on quasi-median graphs.Comment: 16 pages. Comments are welcom
Approximation Algorithms for Geometric Median Problems
In this paper we present approximation algorithms for median problems in
metric spaces and xed-dimensional Euclidean space. Our algorithms use a new
method for transforming an optimal solution of the linear program relaxation
of the s-median problem into a provably good integral solution. This transfor-
mation technique is fundamentally di erent from the methods of randomized
and deterministic rounding [Rag, RaT] and the methods proposed in [LiV] in
the following way: Previous techniques never set variables with zero values in
the fractional solution to 1. This departure from previous methods is crucial
for the success of our algorithms
Deterministic Sampling and Range Counting in Geometric Data Streams
We present memory-efficient deterministic algorithms for constructing
epsilon-nets and epsilon-approximations of streams of geometric data. Unlike
probabilistic approaches, these deterministic samples provide guaranteed bounds
on their approximation factors. We show how our deterministic samples can be
used to answer approximate online iceberg geometric queries on data streams. We
use these techniques to approximate several robust statistics of geometric data
streams, including Tukey depth, simplicial depth, regression depth, the
Thiel-Sen estimator, and the least median of squares. Our algorithms use only a
polylogarithmic amount of memory, provided the desired approximation factors
are inverse-polylogarithmic. We also include a lower bound for non-iceberg
geometric queries.Comment: 12 pages, 1 figur
Online estimation of the geometric median in Hilbert spaces : non asymptotic confidence balls
Estimation procedures based on recursive algorithms are interesting and
powerful techniques that are able to deal rapidly with (very) large samples of
high dimensional data. The collected data may be contaminated by noise so that
robust location indicators, such as the geometric median, may be preferred to
the mean. In this context, an estimator of the geometric median based on a fast
and efficient averaged non linear stochastic gradient algorithm has been
developed by Cardot, C\'enac and Zitt (2013). This work aims at studying more
precisely the non asymptotic behavior of this algorithm by giving non
asymptotic confidence balls. This new result is based on the derivation of
improved rates of convergence as well as an exponential inequality for
the martingale terms of the recursive non linear Robbins-Monro algorithm
On the Strategyproofness of the Geometric Median
The geometric median of a tuple of vectors is the vector that minimizes the
sum of Euclidean distances to the vectors of the tuple. Classically called the
Fermat-Weber problem and applied to facility location, it has become a major
component of the robust learning toolbox. It is typically used to aggregate the
(processed) inputs of different data providers, whose motivations may diverge,
especially in applications like content moderation. Interestingly, as a voting
system, the geometric median has well-known desirable properties: it is a
provably good average approximation, it is robust to a minority of malicious
voters, and it satisfies the "one voter, one unit force" fairness principle.
However, what was not known is the extent to which the geometric median is
strategyproof. Namely, can a strategic voter significantly gain by misreporting
their preferred vector?
We prove in this paper that, perhaps surprisingly, the geometric median is
not even -strategyproof, where bounds what a voter can gain by
deviating from truthfulness. But we also prove that, in the limit of a large
number of voters with i.i.d. preferred vectors, the geometric median is
asymptotically -strategyproof. We show how to compute this bound
. We then generalize our results to voters who care more about some
dimensions. Roughly, we show that, if some dimensions are more polarized and
regarded as more important, then the geometric median becomes less
strategyproof. Interestingly, we also show how the skewed geometric medians can
improve strategyproofness. Nevertheless, if voters care differently about
different dimensions, we prove that no skewed geometric median can achieve
strategyproofness for all. Overall, our results constitute a coherent set of
insights into the extent to which the geometric median is suitable to aggregate
high-dimensional disagreements.Comment: 55 pages, 7 figure
- …