Search CORE

98 research outputs found

Kullback-Leibler aggregation and misspecified generalized linear models

Author: Rigollet Philippe
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/04/2012
Field of study

In a regression setup with deterministic design, we study the pure aggregation problem and introduce a natural extension from the Gaussian distribution to distributions in the exponential family. While this extension bears strong connections with generalized linear models, it does not require identifiability of the parameter or even that the model on the systematic component is true. It is shown that this problem can be solved by constrained and/or penalized likelihood maximization and we derive sharp oracle inequalities that hold both in expectation and with high probability. Finally all the bounds are proved to be optimal in a minimax sense.Comment: Published in at http://dx.doi.org/10.1214/11-AOS961 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Linear and convex aggregation of density estimators

Author: Rigollet Philippe
Tsybakov Alexandre
Publication venue
Publication date: 01/01/2004
Field of study

We study the problem of linear and convex aggregation of

M

estimators of a density with respect to the mean squared risk. We provide procedures for linear and convex aggregation and we prove oracle inequalities for their risks. We also obtain lower bounds showing that these procedures are rate optimal in a minimax sense. As an example, we apply general results to aggregation of multivariate kernel density estimators with different bandwidths. We show that linear and convex aggregates mimic the kernel oracles in asymptotically exact sense for a large class of kernels including Gaussian, Silverman's and Pinsker's ones. We prove that, for Pinsker's kernel, the proposed aggregates are sharp asymptotically minimax simultaneously over a large scale of Sobolev classes of densities. Finally, we provide simulations demonstrating performance of the convex aggregation procedure.Comment: 22 page

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot

Entropic optimal transport is maximum-likelihood deconvolution

Author: Rigollet Philippe
Weed Jonathan
Publication venue
Publication date: 01/01/2018
Field of study

We give a statistical interpretation of entropic optimal transport by showing that performing maximum-likelihood estimation for Gaussian deconvolution corresponds to calculating a projection with respect to the entropic optimal transport distance. This structural result gives theoretical support for the wide adoption of these tools in the machine learning community

arXiv.org e-Print Archive

DSpace@MIT

Comptes Rendus Mathématique

Numérisation de Documents Anciens Mathématiques

Optimal learning with $Q$ -aggregation

Author: Lecué Guillaume
Rigollet Philippe
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/02/2014
Field of study

We consider a general supervised learning problem with strongly convex and Lipschitz loss and study the problem of model selection aggregation. In particular, given a finite dictionary functions (learners) together with the prior, we generalize the results obtained by Dai, Rigollet and Zhang [Ann. Statist. 40 (2012) 1878-1905] for Gaussian regression with squared loss and fixed design to this learning setup. Specifically, we prove that the

Q

-aggregation procedure outputs an estimator that satisfies optimal oracle inequalities both in expectation and with high probability. Our proof techniques somewhat depart from traditional proofs by making most of the standard arguments on the Laplace transform of the empirical process to be controlled.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1190 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Optimal detection of sparse principal components in high dimension

Author: Berthet Quentin
Rigollet Philippe
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1127 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

Optimal rates for plug-in estimators of density level sets

Author: Rigollet Philippe
Vert Régis
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 12/01/2010
Field of study

In the context of density level set estimation, we study the convergence of general plug-in methods under two main assumptions on the density for a given level

\lambda

. More precisely, it is assumed that the density (i) is smooth in a neighborhood of

\lambda

and (ii) has

\gamma

-exponent at level

\lambda

. Condition (i) ensures that the density can be estimated at a standard nonparametric rate and condition (ii) is similar to Tsybakov's margin assumption which is stated for the classification framework. Under these assumptions, we derive optimal rates of convergence for plug-in estimators. Explicit convergence rates are given for plug-in estimators based on kernel density estimators when the underlying measure is the Lebesgue measure. Lower bounds proving optimality of the rates in a minimax sense when the density is H\"older smooth are also provided.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ184 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

Crossref

Uncoupled isotonic regression via minimum Wasserstein deconvolution

Author: Rigollet Philippe
Weed Jonathan
Publication venue
Publication date: 24/03/2019
Field of study

Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown nondecreasing regression function

f

from independent pairs

(x_i, y_i)

where

\mathbb{E}[y_i]=f(x_i), i=1, \ldots n

. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart where one is given only the unordered sets

\{x_1, \ldots, x_n\}

and

\{y_1, \ldots, y_n\}

. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on

y_i

and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.Comment: To appear in Information and Inference: a Journal of the IM

arXiv.org e-Print Archive

DSpace@MIT

Exponential Screening and optimal rates of sparse estimation

Author: Rigollet Philippe
Tsybakov Alexandre
Publication venue
Publication date: 27/07/2010
Field of study

In high-dimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sparse in some sense, for example, that it involves only few covariates. We consider a general, non necessarily linear, regression with Gaussian noise and study a related question that is to find a linear combination of approximating functions, which is at the same time sparse and has small mean squared error (MSE). We introduce a new estimation procedure, called Exponential Screening that shows remarkable adaptation properties. It adapts to the linear combination that optimally balances MSE and sparsity, whether the latter is measured in terms of the number of non-zero entries in the combination (

\ell_0

norm) or in terms of the global weight of the combination (

\ell_1

norm). The power of this adaptation result is illustrated by showing that Exponential Screening solves optimally and simultaneously all the problems of aggregation in Gaussian regression that have been discussed in the literature. Moreover, we show that the performance of the Exponential Screening estimator cannot be improved in a minimax sense, even if the optimal sparsity is known in advance. The theoretical and numerical superiority of Exponential Screening compared to state-of-the-art sparse procedures is also discussed

arXiv.org e-Print Archive

Hal-Diderot