923 research outputs found
Projected principal component analysis in factor models
This paper introduces a Projected Principal Component Analysis
(Projected-PCA), which employs principal component analysis to the projected
(smoothed) data matrix onto a given linear space spanned by covariates. When it
applies to high-dimensional factor analysis, the projection removes noise
components. We show that the unobserved latent factors can be more accurately
estimated than the conventional PCA if the projection is genuine, or more
precisely, when the factor loading matrices are related to the projected linear
space. When the dimensionality is large, the factors can be estimated
accurately even when the sample size is finite. We propose a flexible
semiparametric factor model, which decomposes the factor loading matrix into
the component that can be explained by subject-specific covariates and the
orthogonal residual component. The covariates' effects on the factor loadings
are further modeled by the additive model via sieve approximations. By using
the newly proposed Projected-PCA, the rates of convergence of the smooth factor
loading matrices are obtained, which are much faster than those of the
conventional factor analysis. The convergence is achieved even when the sample
size is finite and is particularly appealing in the
high-dimension-low-sample-size situation. This leads us to developing
nonparametric tests on whether observed covariates have explaining powers on
the loadings and whether they fully explain the loadings. The proposed method
is illustrated by both simulated data and the returns of the components of the
S&P 500 index.Comment: Published at http://dx.doi.org/10.1214/15-AOS1364 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Robust Estimation of High-Dimensional Mean Regression
Data subject to heavy-tailed errors are commonly encountered in various
scientific fields, especially in the modern era with explosion of massive data.
To address this problem, procedures based on quantile regression and Least
Absolute Deviation (LAD) regression have been devel- oped in recent years.
These methods essentially estimate the conditional median (or quantile)
function. They can be very different from the conditional mean functions when
distributions are asymmetric and heteroscedastic. How can we efficiently
estimate the mean regression functions in ultra-high dimensional setting with
existence of only the second moment? To solve this problem, we propose a
penalized Huber loss with diverging parameter to reduce biases created by the
traditional Huber loss. Such a penalized robust approximate quadratic
(RA-quadratic) loss will be called RA-Lasso. In the ultra-high dimensional
setting, where the dimensionality can grow exponentially with the sample size,
our results reveal that the RA-lasso estimator produces a consistent estimator
at the same rate as the optimal rate under the light-tail situation. We further
study the computational convergence of RA-Lasso and show that the composite
gradient descent algorithm indeed produces a solution that admits the same
optimal rate after sufficient iterations. As a byproduct, we also establish the
concentration inequality for estimat- ing population mean when there exists
only the second moment. We compare RA-Lasso with other regularized robust
estimators based on quantile regression and LAD regression. Extensive
simulation studies demonstrate the satisfactory finite-sample performance of
RA-Lasso
- …