Search CORE

13,938 research outputs found

Empirical Bernstein Inequalities for U-Statistics

Author: Anthoine Sandrine
Peel Thomas
Ralaivola Liva
Publication venue: HAL CCSD
Publication date: 06/12/2010
Field of study

International audienceWe present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernstein-type inequality for U-statistics derived by Arcones. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efficient pro- cedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions

HAL AMU

A Bernstein-Von Mises Theorem for discrete probability distributions

Author: Boucheron S.
Gassiat E.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

We investigate the asymptotic normality of the posterior distribution in the discrete setting, when model dimension increases with sample size. We consider a probability mass function

\theta_0

on \mathbbm{N}\setminus \{0\} and a sequence of truncation levels

(k_n)_n

satisfying

k_n^3\leq n\inf_{i\leq k_n}\theta_0(i).

Let

\hat{\theta}

denote the maximum likelihood estimate of

(\theta_0(i))_{i\leq k_n}

and let

\Delta_n(\theta_0)

denote the

k_n

-dimensional vector which

i

-th coordinate is defined by \sqrt{n} (\hat{\theta}_n(i)-\theta_0(i)) for

1\leq i\leq k_n.

We check that under mild conditions on

\theta_0

and on the sequence of prior probabilities on the

k_n

-dimensional simplices, after centering and rescaling, the variation distance between the posterior distribution recentered around

\hat{\theta}_n

and rescaled by

\sqrt{n}

and the

k_n

-dimensional Gaussian distribution

\mathcal{N}(\Delta_n(\theta_0),I^{-1}(\theta_0))

converges in probability to

0.

This theorem can be used to prove the asymptotic normality of Bayesian estimators of Shannon and R\'{e}nyi entropies. The proofs are based on concentration inequalities for centered and non-centered Chi-square (Pearson) statistics. The latter allow to establish posterior concentration rates with respect to Fisher distance rather than with respect to the Hellinger distance as it is commonplace in non-parametric Bayesian statistics.Comment: Published in at http://dx.doi.org/10.1214/08-EJS262 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal-Diderot

General nonexact oracle inequalities for classes with a subexponential envelope

Author: Lecué Guillaume
Mendelson Shahar
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 05/06/2012
Field of study

We show that empirical risk minimization procedures and regularized empirical risk minimization procedures satisfy nonexact oracle inequalities in an unbounded framework, under the assumption that the class has a subexponential envelope function. The main novelty, in addition to the boundedness assumption free setup, is that those inequalities can yield fast rates even in situations in which exact oracle inequalities only hold with slower rates. We apply these results to show that procedures based on

\ell_1

and nuclear norms regularization functions satisfy oracle inequalities with a residual term that decreases like

1/n

for every

L_q

-loss functions (

q\geq2

), while only assuming that the tail behavior of the input and output variables are well behaved. In particular, no RIP type of assumption or "incoherence condition" are needed to obtain fast residual terms in those setups. We also apply these results to the problems of convex aggregation and model selection.Comment: Published in at http://dx.doi.org/10.1214/11-AOS965 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

The Australian National University

HAL - UPEC / UPEM

Concentration inequalities for sampling without replacement

Author: Bardenet Rémi
Maillard Odalric-Ambrym
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/01/2015
Field of study

Concentration inequalities quantify the deviation of a random variable from a fixed value. In spite of numerous applications, such as opinion surveys or ecological counting procedures, few concentration results are known for the setting of sampling without replacement from a finite population. Until now, the best general concentration inequality has been a Hoeffding inequality due to Serfling [Ann. Statist. 2 (1974) 39-48]. In this paper, we first improve on the fundamental result of Serfling [Ann. Statist. 2 (1974) 39-48], and further extend it to obtain a Bernstein concentration bound for sampling without replacement. We then derive an empirical version of our bound that does not require the variance to be known to the user.Comment: Published at http://dx.doi.org/10.3150/14-BEJ605 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

The notion of $\psi$ -weak dependence and its applications to bootstrapping time series

Author: Doukhan Paul
Neumann Michael H.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

We give an introduction to a notion of weak dependence which is more general than mixing and allows to treat for example processes driven by discrete innovations as they appear with time series bootstrap. As a typical example, we analyze autoregressive processes and their bootstrap analogues in detail and show how weak dependence can be easily derived from a contraction property of the process. Furthermore, we provide an overview of classes of processes possessing the property of weak dependence and describe important probabilistic results under such an assumption.Comment: Published in at http://dx.doi.org/10.1214/06-PS086 the Probability Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

HAL-Paris1

Tail bounds for all eigenvalues of a sum of random matrices

Author: Gittens Alex
Tropp Joel A.
Publication venue
Publication date: 01/01/2011
Field of study

This work introduces the minimax Laplace transform method, a modification of the cumulant-based matrix Laplace transform method developed in "User-friendly tail bounds for sums of random matrices" (arXiv:1004.4389v6) that yields both upper and lower bounds on each eigenvalue of a sum of random self-adjoint matrices. This machinery is used to derive eigenvalue analogues of the classical Chernoff, Bennett, and Bernstein bounds. Two examples demonstrate the efficacy of the minimax Laplace transform. The first concerns the effects of column sparsification on the spectrum of a matrix with orthonormal rows. Here, the behavior of the singular values can be described in terms of coherence-like quantities. The second example addresses the question of relative accuracy in the estimation of eigenvalues of the covariance matrix of a random process. Standard results on the convergence of sample covariance matrices provide bounds on the number of samples needed to obtain relative accuracy in the spectral norm, but these results only guarantee relative accuracy in the estimate of the maximum eigenvalue. The minimax Laplace transform argument establishes that if the lowest eigenvalues decay sufficiently fast, on the order of (K^2*r*log(p))/eps^2 samples, where K is the condition number of an optimal rank-r approximation to C, are sufficient to ensure that the dominant r eigenvalues of the covariance matrix of a N(0, C) random vector are estimated to within a factor of 1+-eps with high probability.Comment: 20 pages, 1 figure, see also arXiv:1004.4389v

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors