13,938 research outputs found
Empirical Bernstein Inequalities for U-Statistics
International audienceWe present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernstein-type inequality for U-statistics derived by Arcones. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efficient pro- cedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions
A Bernstein-Von Mises Theorem for discrete probability distributions
We investigate the asymptotic normality of the posterior distribution in the
discrete setting, when model dimension increases with sample size. We consider
a probability mass function on \mathbbm{N}\setminus \{0\} and a
sequence of truncation levels satisfying Let denote the maximum likelihood estimate of
and let denote the
-dimensional vector which -th coordinate is defined by \sqrt{n}
(\hat{\theta}_n(i)-\theta_0(i)) for We check that under mild
conditions on and on the sequence of prior probabilities on the
-dimensional simplices, after centering and rescaling, the variation
distance between the posterior distribution recentered around
and rescaled by and the -dimensional Gaussian distribution
converges in probability to
This theorem can be used to prove the asymptotic normality of Bayesian
estimators of Shannon and R\'{e}nyi entropies. The proofs are based on
concentration inequalities for centered and non-centered Chi-square (Pearson)
statistics. The latter allow to establish posterior concentration rates with
respect to Fisher distance rather than with respect to the Hellinger distance
as it is commonplace in non-parametric Bayesian statistics.Comment: Published in at http://dx.doi.org/10.1214/08-EJS262 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
General nonexact oracle inequalities for classes with a subexponential envelope
We show that empirical risk minimization procedures and regularized empirical
risk minimization procedures satisfy nonexact oracle inequalities in an
unbounded framework, under the assumption that the class has a subexponential
envelope function. The main novelty, in addition to the boundedness assumption
free setup, is that those inequalities can yield fast rates even in situations
in which exact oracle inequalities only hold with slower rates. We apply these
results to show that procedures based on and nuclear norms
regularization functions satisfy oracle inequalities with a residual term that
decreases like for every -loss functions (), while only
assuming that the tail behavior of the input and output variables are well
behaved. In particular, no RIP type of assumption or "incoherence condition"
are needed to obtain fast residual terms in those setups. We also apply these
results to the problems of convex aggregation and model selection.Comment: Published in at http://dx.doi.org/10.1214/11-AOS965 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Concentration inequalities for sampling without replacement
Concentration inequalities quantify the deviation of a random variable from a
fixed value. In spite of numerous applications, such as opinion surveys or
ecological counting procedures, few concentration results are known for the
setting of sampling without replacement from a finite population. Until now,
the best general concentration inequality has been a Hoeffding inequality due
to Serfling [Ann. Statist. 2 (1974) 39-48]. In this paper, we first improve on
the fundamental result of Serfling [Ann. Statist. 2 (1974) 39-48], and further
extend it to obtain a Bernstein concentration bound for sampling without
replacement. We then derive an empirical version of our bound that does not
require the variance to be known to the user.Comment: Published at http://dx.doi.org/10.3150/14-BEJ605 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
The notion of -weak dependence and its applications to bootstrapping time series
We give an introduction to a notion of weak dependence which is more general
than mixing and allows to treat for example processes driven by discrete
innovations as they appear with time series bootstrap. As a typical example, we
analyze autoregressive processes and their bootstrap analogues in detail and
show how weak dependence can be easily derived from a contraction property of
the process. Furthermore, we provide an overview of classes of processes
possessing the property of weak dependence and describe important probabilistic
results under such an assumption.Comment: Published in at http://dx.doi.org/10.1214/06-PS086 the Probability
Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Tail bounds for all eigenvalues of a sum of random matrices
This work introduces the minimax Laplace transform method, a modification of
the cumulant-based matrix Laplace transform method developed in "User-friendly
tail bounds for sums of random matrices" (arXiv:1004.4389v6) that yields both
upper and lower bounds on each eigenvalue of a sum of random self-adjoint
matrices. This machinery is used to derive eigenvalue analogues of the
classical Chernoff, Bennett, and Bernstein bounds.
Two examples demonstrate the efficacy of the minimax Laplace transform. The
first concerns the effects of column sparsification on the spectrum of a matrix
with orthonormal rows. Here, the behavior of the singular values can be
described in terms of coherence-like quantities. The second example addresses
the question of relative accuracy in the estimation of eigenvalues of the
covariance matrix of a random process. Standard results on the convergence of
sample covariance matrices provide bounds on the number of samples needed to
obtain relative accuracy in the spectral norm, but these results only guarantee
relative accuracy in the estimate of the maximum eigenvalue. The minimax
Laplace transform argument establishes that if the lowest eigenvalues decay
sufficiently fast, on the order of (K^2*r*log(p))/eps^2 samples, where K is the
condition number of an optimal rank-r approximation to C, are sufficient to
ensure that the dominant r eigenvalues of the covariance matrix of a N(0, C)
random vector are estimated to within a factor of 1+-eps with high probability.Comment: 20 pages, 1 figure, see also arXiv:1004.4389v
- …