21 research outputs found
Fused sparsity and robust estimation for linear models with unknown variance
International audienceIn this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. We establish finite sample risk bounds and carry out an experimental evaluation on both synthetic and real data
Confidence regions and minimax rates in outlier-robust estimation on the probability simplex
We consider the problem of estimating the mean of a distribution supported by
the -dimensional probability simplex in the setting where an
fraction of observations are subject to adversarial corruption. A simple
particular example is the problem of estimating the distribution of a discrete
random variable. Assuming that the discrete variable takes values, the
unknown parameter is a -dimensional vector belonging to
the probability simplex. We first describe various settings of contamination
and discuss the relation between these settings. We then establish minimax
rates when the quality of estimation is measured by the total-variation
distance, the Hellinger distance, or the -distance between two
probability measures. We also provide confidence regions for the unknown mean
that shrink at the minimax rate. Our analysis reveals that the minimax rates
associated to these three distances are all different, but they are all
attained by the sample average. Furthermore, we show that the latter is
adaptive to the possible sparsity of the unknown vector. Some numerical
experiments illustrating our theoretical findings are reported
Pivotal estimation via square-root Lasso in nonparametric regression
We propose a self-tuning method that simultaneously
resolves three important practical problems in high-dimensional regression
analysis, namely it handles the unknown scale, heteroscedasticity and (drastic)
non-Gaussianity of the noise. In addition, our analysis allows for badly
behaved designs, for example, perfectly collinear regressors, and generates
sharp bounds even in extreme cases, such as the infinite variance case and the
noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds
for including prediction norm rate and sparsity. Our
analysis is based on new impact factors that are tailored for bounding
prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely
on moderate deviation theory for self-normalized sums to achieve Gaussian-like
results under weak conditions. Moreover, we derive bounds on the performance of
ordinary least square (ols) applied to the model selected by accounting for possible misspecification of the selected model. Under
mild conditions, the rate of convergence of ols post
is as good as 's rate. As an application, we consider
the use of and ols post as
estimators of nuisance parameters in a generic semiparametric problem
(nonlinear moment condition or -problem), resulting in a construction of
-consistent and asymptotically normal estimators of the main
parameters.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1204 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On estimation of the diagonal elements of a sparse precision matrix
In this paper, we present several estimators of the diagonal elements of the
inverse of the covariance matrix, called precision matrix, of a sample of iid
random vectors. The focus is on high dimensional vectors having a sparse
precision matrix. It is now well understood that when the underlying
distribution is Gaussian, the columns of the precision matrix can be estimated
independently form one another by solving linear regression problems under
sparsity constraints. This approach leads to a computationally efficient
strategy for estimating the precision matrix that starts by estimating the
regression vectors, then estimates the diagonal entries of the precision matrix
and, in a final step, combines these estimators for getting estimators of the
off-diagonal entries. While the step of estimating the regression vector has
been intensively studied over the past decade, the problem of deriving
statistically accurate estimators of the diagonal entries has received much
less attention. The goal of the present paper is to fill this gap by presenting
four estimators---that seem the most natural ones---of the diagonal entries of
the precision matrix and then performing a comprehensive empirical evaluation
of these estimators. The estimators under consideration are the residual
variance, the relaxed maximum likelihood, the symmetry-enforced maximum
likelihood and the penalized maximum likelihood. We show, both theoretically
and empirically, that when the aforementioned regression vectors are estimated
without error, the symmetry-enforced maximum likelihood estimator has the
smallest estimation error. However, in a more realistic setting when the
regression vector is estimated by a sparsity-favoring computationally efficient
method, the qualities of the estimators become relatively comparable with a
slight advantage for the residual variance estimator.Comment: Companion R package at
http://cran.r-project.org/web/packages/DESP/index.htm