148,462 research outputs found

    Estimating the geometric median in Hilbert spaces with stochastic gradient algorithms: LpL^{p} and almost sure rates of convergence

    Full text link
    The geometric median, also called L1L^{1}-median, is often used in robust statistics. Moreover, it is more and more usual to deal with large samples taking values in high dimensional spaces. In this context, a fast recursive estimator has been introduced by Cardot, Cenac and Zitt. This work aims at studying more precisely the asymptotic behavior of the estimators of the geometric median based on such non linear stochastic gradient algorithms. The LpL^{p} rates of convergence as well as almost sure rates of convergence of these estimators are derived in general separable Hilbert spaces. Moreover, the optimal rate of convergence in quadratic mean of the averaged algorithm is also given

    Groups acting on quasi-median graphs. An introduction

    Get PDF
    Quasi-median graphs have been introduced by Mulder in 1980 as a generalisation of median graphs, known in geometric group theory to naturally coincide with the class of CAT(0) cube complexes. In his PhD thesis, the author showed that quasi-median graphs may be useful to study groups as well. In the present paper, we propose a gentle introduction to the theory of groups acting on quasi-median graphs.Comment: 16 pages. Comments are welcom

    Approximation Algorithms for Geometric Median Problems

    Get PDF
    In this paper we present approximation algorithms for median problems in metric spaces and xed-dimensional Euclidean space. Our algorithms use a new method for transforming an optimal solution of the linear program relaxation of the s-median problem into a provably good integral solution. This transfor- mation technique is fundamentally di erent from the methods of randomized and deterministic rounding [Rag, RaT] and the methods proposed in [LiV] in the following way: Previous techniques never set variables with zero values in the fractional solution to 1. This departure from previous methods is crucial for the success of our algorithms

    Deterministic Sampling and Range Counting in Geometric Data Streams

    Get PDF
    We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are inverse-polylogarithmic. We also include a lower bound for non-iceberg geometric queries.Comment: 12 pages, 1 figur

    Online estimation of the geometric median in Hilbert spaces : non asymptotic confidence balls

    Full text link
    Estimation procedures based on recursive algorithms are interesting and powerful techniques that are able to deal rapidly with (very) large samples of high dimensional data. The collected data may be contaminated by noise so that robust location indicators, such as the geometric median, may be preferred to the mean. In this context, an estimator of the geometric median based on a fast and efficient averaged non linear stochastic gradient algorithm has been developed by Cardot, C\'enac and Zitt (2013). This work aims at studying more precisely the non asymptotic behavior of this algorithm by giving non asymptotic confidence balls. This new result is based on the derivation of improved L2L^2 rates of convergence as well as an exponential inequality for the martingale terms of the recursive non linear Robbins-Monro algorithm

    On the Strategyproofness of the Geometric Median

    Full text link
    The geometric median of a tuple of vectors is the vector that minimizes the sum of Euclidean distances to the vectors of the tuple. Classically called the Fermat-Weber problem and applied to facility location, it has become a major component of the robust learning toolbox. It is typically used to aggregate the (processed) inputs of different data providers, whose motivations may diverge, especially in applications like content moderation. Interestingly, as a voting system, the geometric median has well-known desirable properties: it is a provably good average approximation, it is robust to a minority of malicious voters, and it satisfies the "one voter, one unit force" fairness principle. However, what was not known is the extent to which the geometric median is strategyproof. Namely, can a strategic voter significantly gain by misreporting their preferred vector? We prove in this paper that, perhaps surprisingly, the geometric median is not even α\alpha-strategyproof, where α\alpha bounds what a voter can gain by deviating from truthfulness. But we also prove that, in the limit of a large number of voters with i.i.d. preferred vectors, the geometric median is asymptotically α\alpha-strategyproof. We show how to compute this bound α\alpha. We then generalize our results to voters who care more about some dimensions. Roughly, we show that, if some dimensions are more polarized and regarded as more important, then the geometric median becomes less strategyproof. Interestingly, we also show how the skewed geometric medians can improve strategyproofness. Nevertheless, if voters care differently about different dimensions, we prove that no skewed geometric median can achieve strategyproofness for all. Overall, our results constitute a coherent set of insights into the extent to which the geometric median is suitable to aggregate high-dimensional disagreements.Comment: 55 pages, 7 figure
    corecore