98 research outputs found
Kullback-Leibler aggregation and misspecified generalized linear models
In a regression setup with deterministic design, we study the pure
aggregation problem and introduce a natural extension from the Gaussian
distribution to distributions in the exponential family. While this extension
bears strong connections with generalized linear models, it does not require
identifiability of the parameter or even that the model on the systematic
component is true. It is shown that this problem can be solved by constrained
and/or penalized likelihood maximization and we derive sharp oracle
inequalities that hold both in expectation and with high probability. Finally
all the bounds are proved to be optimal in a minimax sense.Comment: Published in at http://dx.doi.org/10.1214/11-AOS961 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Linear and convex aggregation of density estimators
We study the problem of linear and convex aggregation of estimators of a
density with respect to the mean squared risk. We provide procedures for linear
and convex aggregation and we prove oracle inequalities for their risks. We
also obtain lower bounds showing that these procedures are rate optimal in a
minimax sense. As an example, we apply general results to aggregation of
multivariate kernel density estimators with different bandwidths. We show that
linear and convex aggregates mimic the kernel oracles in asymptotically exact
sense for a large class of kernels including Gaussian, Silverman's and
Pinsker's ones. We prove that, for Pinsker's kernel, the proposed aggregates
are sharp asymptotically minimax simultaneously over a large scale of Sobolev
classes of densities. Finally, we provide simulations demonstrating performance
of the convex aggregation procedure.Comment: 22 page
Entropic optimal transport is maximum-likelihood deconvolution
We give a statistical interpretation of entropic optimal transport by showing
that performing maximum-likelihood estimation for Gaussian deconvolution
corresponds to calculating a projection with respect to the entropic optimal
transport distance. This structural result gives theoretical support for the
wide adoption of these tools in the machine learning community
Optimal learning with -aggregation
We consider a general supervised learning problem with strongly convex and
Lipschitz loss and study the problem of model selection aggregation. In
particular, given a finite dictionary functions (learners) together with the
prior, we generalize the results obtained by Dai, Rigollet and Zhang [Ann.
Statist. 40 (2012) 1878-1905] for Gaussian regression with squared loss and
fixed design to this learning setup. Specifically, we prove that the
-aggregation procedure outputs an estimator that satisfies optimal oracle
inequalities both in expectation and with high probability. Our proof
techniques somewhat depart from traditional proofs by making most of the
standard arguments on the Laplace transform of the empirical process to be
controlled.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1190 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Optimal detection of sparse principal components in high dimension
We perform a finite sample analysis of the detection levels for sparse
principal components of a high-dimensional covariance matrix. Our minimax
optimal test is based on a sparse eigenvalue statistic. Alas, computing this
test is known to be NP-complete in general, and we describe a computationally
efficient alternative test using convex relaxations. Our relaxation is also
proved to detect sparse principal components at near optimal detection levels,
and it performs well on simulated datasets. Moreover, using polynomial time
reductions from theoretical computer science, we bring significant evidence
that our results cannot be improved, thus revealing an inherent trade off
between statistical and computational performance.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1127 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Optimal rates for plug-in estimators of density level sets
In the context of density level set estimation, we study the convergence of
general plug-in methods under two main assumptions on the density for a given
level . More precisely, it is assumed that the density (i) is smooth
in a neighborhood of and (ii) has -exponent at level
. Condition (i) ensures that the density can be estimated at a
standard nonparametric rate and condition (ii) is similar to Tsybakov's margin
assumption which is stated for the classification framework. Under these
assumptions, we derive optimal rates of convergence for plug-in estimators.
Explicit convergence rates are given for plug-in estimators based on kernel
density estimators when the underlying measure is the Lebesgue measure. Lower
bounds proving optimality of the rates in a minimax sense when the density is
H\"older smooth are also provided.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ184 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Uncoupled isotonic regression via minimum Wasserstein deconvolution
Isotonic regression is a standard problem in shape-constrained estimation
where the goal is to estimate an unknown nondecreasing regression function
from independent pairs where . While this problem is well understood both statistically and
computationally, much less is known about its uncoupled counterpart where one
is given only the unordered sets and . In this work, we leverage tools from optimal transport theory to derive
minimax rates under weak moments conditions on and to give an efficient
algorithm achieving optimal rates. Both upper and lower bounds employ
moment-matching arguments that are also pertinent to learning mixtures of
distributions and deconvolution.Comment: To appear in Information and Inference: a Journal of the IM
Exponential Screening and optimal rates of sparse estimation
In high-dimensional linear regression, the goal pursued here is to estimate
an unknown regression function using linear combinations of a suitable set of
covariates. One of the key assumptions for the success of any statistical
procedure in this setup is to assume that the linear combination is sparse in
some sense, for example, that it involves only few covariates. We consider a
general, non necessarily linear, regression with Gaussian noise and study a
related question that is to find a linear combination of approximating
functions, which is at the same time sparse and has small mean squared error
(MSE). We introduce a new estimation procedure, called Exponential Screening
that shows remarkable adaptation properties. It adapts to the linear
combination that optimally balances MSE and sparsity, whether the latter is
measured in terms of the number of non-zero entries in the combination
( norm) or in terms of the global weight of the combination (
norm). The power of this adaptation result is illustrated by showing that
Exponential Screening solves optimally and simultaneously all the problems of
aggregation in Gaussian regression that have been discussed in the literature.
Moreover, we show that the performance of the Exponential Screening estimator
cannot be improved in a minimax sense, even if the optimal sparsity is known in
advance. The theoretical and numerical superiority of Exponential Screening
compared to state-of-the-art sparse procedures is also discussed
- …