691 research outputs found
Optimal Calibration for Multiple Testing against Local Inhomogeneity in Higher Dimension
Based on two independent samples X_1,...,X_m and X_{m+1},...,X_n drawn from
multivariate distributions with unknown Lebesgue densities p and q
respectively, we propose an exact multiple test in order to identify
simultaneously regions of significant deviations between p and q. The
construction is built from randomized nearest-neighbor statistics. It does not
require any preliminary information about the multivariate densities such as
compact support, strict positivity or smoothness and shape properties. The
properly adjusted multiple testing procedure is shown to be sharp-optimal for
typical arrangements of the observation values which appear with probability
close to one. The proof relies on a new coupling Bernstein type exponential
inequality, reflecting the non-subgaussian tail behavior of a combinatorial
process. For power investigation of the proposed method a reparametrized
minimax set-up is introduced, reducing the composite hypothesis "p=q" to a
simple one with the multivariate mixed density (m/n)p+(1-m/n)q as infinite
dimensional nuisance parameter. Within this framework, the test is shown to be
spatially and sharply asymptotically adaptive with respect to uniform loss on
isotropic H\"older classes. The exact minimax risk asymptotics are obtained in
terms of solutions of the optimal recovery
Recommended from our members
Smoothness assumptions in human and machine vision, and their implications for optimal surface interpolation
In this paper we shall examine what smoothness assumptions are made about object surfaces, object motion, and image intensities. We begin by looking into the physiological limits of vision and how these might influence our perception of smoothness. We then look at a sampling of the computer vision and psychology literature, inferring smoothness constraints from the mathematical assumptions tacitly presumed by researchers. This look at computer vision and psychology of vision is not meant to be an inclusive study, but rather representative of the assumptions made, and in part representative of the mathematical models used therein. We shall conclude that prevalent assumptions are that surfaces, motion, and intensity images are functions in C2, eland c2 respectively. In the latter portion of this paper we examine one use of explicit assumptions on smoothness in the definition of existing method for obtaining "optimal" surface interpolation. We briefly introduce the nomenclature of information-based complexity originated by Traub, Wozniakowski, and their colleagues, which is the mathematical machinery used in obtaining these "optimal" surfaces. This theory requires that we know the class of functions from which our desired surface comes, and part of the definition of a class is the degree of smoothness. We then survey many possible classes for the visual interpolation problem of two dimensional surfaces, and state formulas from which one can obtain the optimal surface interpolating given depth data
Fast global convergence of gradient methods for high-dimensional statistical recovery
Many statistical -estimators are based on convex optimization problems
formed by the combination of a data-dependent loss function with a norm-based
regularizer. We analyze the convergence rates of projected gradient and
composite gradient methods for solving such problems, working within a
high-dimensional framework that allows the data dimension \pdim to grow with
(and possibly exceed) the sample size \numobs. This high-dimensional
structure precludes the usual global assumptions---namely, strong convexity and
smoothness conditions---that underlie much of classical optimization analysis.
We define appropriately restricted versions of these conditions, and show that
they are satisfied with high probability for various statistical models. Under
these conditions, our theory guarantees that projected gradient descent has a
globally geometric rate of convergence up to the \emph{statistical precision}
of the model, meaning the typical distance between the true unknown parameter
and an optimal solution . This result is substantially
sharper than previous convergence results, which yielded sublinear convergence,
or linear convergence only up to the noise level. Our analysis applies to a
wide range of -estimators and statistical models, including sparse linear
regression using Lasso (-regularized regression); group Lasso for block
sparsity; log-linear models with regularization; low-rank matrix recovery using
nuclear norm regularization; and matrix decomposition. Overall, our analysis
reveals interesting connections between statistical precision and computational
efficiency in high-dimensional estimation
Finite-sample Analysis of M-estimators using Self-concordance
We demonstrate how self-concordance of the loss can be exploited to obtain
asymptotically optimal rates for M-estimators in finite-sample regimes. We
consider two classes of losses: (i) canonically self-concordant losses in the
sense of Nesterov and Nemirovski (1994), i.e., with the third derivative
bounded with the power of the second; (ii) pseudo self-concordant losses,
for which the power is removed, as introduced by Bach (2010). These classes
contain some losses arising in generalized linear models, including logistic
regression; in addition, the second class includes some common pseudo-Huber
losses. Our results consist in establishing the critical sample size sufficient
to reach the asymptotically optimal excess risk for both classes of losses.
Denoting the parameter dimension, and the effective
dimension which takes into account possible model misspecification, we find the
critical sample size to be for canonically
self-concordant losses, and for pseudo
self-concordant losses, where is the problem-dependent local curvature
parameter. In contrast to the existing results, we only impose local
assumptions on the data distribution, assuming that the calibrated design,
i.e., the design scaled with the square root of the second derivative of the
loss, is subgaussian at the best predictor . Moreover, we obtain the
improved bounds on the critical sample size, scaling near-linearly in
, under the extra assumption that the calibrated design
is subgaussian in the Dikin ellipsoid of . Motivated by these
findings, we construct canonically self-concordant analogues of the Huber and
logistic losses with improved statistical properties. Finally, we extend some
of these results to -regularized M-estimators in high dimensions
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
- …