198 research outputs found
Optimal Rates of Statistical Seriation
Given a matrix the seriation problem consists in permuting its rows in such
way that all its columns have the same shape, for example, they are monotone
increasing. We propose a statistical approach to this problem where the matrix
of interest is observed with noise and study the corresponding minimax rate of
estimation of the matrices. Specifically, when the columns are either unimodal
or monotone, we show that the least squares estimator is optimal up to
logarithmic factors and adapts to matrices with a certain natural structure.
Finally, we propose a computationally efficient estimator in the monotonic case
and study its performance both theoretically and experimentally. Our work is at
the intersection of shape constrained estimation and recent work that involves
permutation learning, such as graph denoising and ranking.Comment: V2 corrects an error in Lemma A.1, v3 corrects appendix F on unimodal
regression where the bounds now hold with polynomial probability rather than
exponentia
General nonexact oracle inequalities for classes with a subexponential envelope
We show that empirical risk minimization procedures and regularized empirical
risk minimization procedures satisfy nonexact oracle inequalities in an
unbounded framework, under the assumption that the class has a subexponential
envelope function. The main novelty, in addition to the boundedness assumption
free setup, is that those inequalities can yield fast rates even in situations
in which exact oracle inequalities only hold with slower rates. We apply these
results to show that procedures based on and nuclear norms
regularization functions satisfy oracle inequalities with a residual term that
decreases like for every -loss functions (), while only
assuming that the tail behavior of the input and output variables are well
behaved. In particular, no RIP type of assumption or "incoherence condition"
are needed to obtain fast residual terms in those setups. We also apply these
results to the problems of convex aggregation and model selection.Comment: Published in at http://dx.doi.org/10.1214/11-AOS965 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Sparse Regression Learning by Aggregation and Langevin Monte-Carlo
We consider the problem of regression learning for deterministic design and
independent random errors. We start by proving a sharp PAC-Bayesian type bound
for the exponentially weighted aggregate (EWA) under the expected squared
empirical loss. For a broad class of noise distributions the presented bound is
valid whenever the temperature parameter of the EWA is larger than or
equal to , where is the noise variance. A remarkable
feature of this result is that it is valid even for unbounded regression
functions and the choice of the temperature parameter depends exclusively on
the noise level. Next, we apply this general bound to the problem of
aggregating the elements of a finite-dimensional linear space spanned by a
dictionary of functions . We allow to be much larger
than the sample size but we assume that the true regression function can be
well approximated by a sparse linear combination of functions . Under
this sparsity scenario, we propose an EWA with a heavy tailed prior and we show
that it satisfies a sparsity oracle inequality with leading constant one.
Finally, we propose several Langevin Monte-Carlo algorithms to approximately
compute such an EWA when the number of aggregated functions can be large.
We discuss in some detail the convergence of these algorithms and present
numerical experiments that confirm our theoretical findings.Comment: Short version published in COLT 200
Model averaging: A shrinkage perspective
Model averaging (MA), a technique for combining estimators from a set of
candidate models, has attracted increasing attention in machine learning and
statistics. In the existing literature, there is an implicit understanding that
MA can be viewed as a form of shrinkage estimation that draws the response
vector towards the subspaces spanned by the candidate models. This paper
explores this perspective by establishing connections between MA and shrinkage
in a linear regression setting with multiple nested models. We first
demonstrate that the optimal MA estimator is the best linear estimator with
monotone non-increasing weights in a Gaussian sequence model. The Mallows MA,
which estimates weights by minimizing the Mallows' , is a variation of the
positive-part Stein estimator. Motivated by these connections, we develop a
novel MA procedure based on a blockwise Stein estimation. Our resulting
Stein-type MA estimator is asymptotically optimal across a broad parameter
space when the variance is known. Numerical results support our theoretical
findings. The connections established in this paper may open up new avenues for
investigating MA from different perspectives. A discussion on some topics for
future research concludes the paper
- …