174 research outputs found
Projected principal component analysis in factor models
This paper introduces a Projected Principal Component Analysis
(Projected-PCA), which employs principal component analysis to the projected
(smoothed) data matrix onto a given linear space spanned by covariates. When it
applies to high-dimensional factor analysis, the projection removes noise
components. We show that the unobserved latent factors can be more accurately
estimated than the conventional PCA if the projection is genuine, or more
precisely, when the factor loading matrices are related to the projected linear
space. When the dimensionality is large, the factors can be estimated
accurately even when the sample size is finite. We propose a flexible
semiparametric factor model, which decomposes the factor loading matrix into
the component that can be explained by subject-specific covariates and the
orthogonal residual component. The covariates' effects on the factor loadings
are further modeled by the additive model via sieve approximations. By using
the newly proposed Projected-PCA, the rates of convergence of the smooth factor
loading matrices are obtained, which are much faster than those of the
conventional factor analysis. The convergence is achieved even when the sample
size is finite and is particularly appealing in the
high-dimension-low-sample-size situation. This leads us to developing
nonparametric tests on whether observed covariates have explaining powers on
the loadings and whether they fully explain the loadings. The proposed method
is illustrated by both simulated data and the returns of the components of the
S&P 500 index.Comment: Published at http://dx.doi.org/10.1214/15-AOS1364 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Large Covariance Estimation by Thresholding Principal Orthogonal Complements
This paper deals with the estimation of a high-dimensional covariance with a
conditional sparsity structure and fast-diverging eigenvalues. By assuming
sparse error covariance matrix in an approximate factor model, we allow for the
presence of some cross-sectional correlation even after taking out common but
unobservable factors. We introduce the Principal Orthogonal complEment
Thresholding (POET) method to explore such an approximate factor structure with
sparsity. The POET estimator includes the sample covariance matrix, the
factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding
estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator
(Cai and Liu, 2011) as specific examples. We provide mathematical insights when
the factor analysis is approximately the same as the principal component
analysis for high-dimensional data. The rates of convergence of the sparse
residual covariance matrix and the conditional sparse covariance matrix are
studied under various norms. It is shown that the impact of estimating the
unknown factors vanishes as the dimensionality increases. The uniform rates of
convergence for the unobserved factors and their factor loadings are derived.
The asymptotic results are also verified by extensive simulation studies.
Finally, a real data application on portfolio allocation is presented
Risks of Large Portfolios
Estimating and assessing the risk of a large portfolio is an important topic
in financial econometrics and risk management. The risk is often estimated by a
substitution of a good estimator of the volatility matrix. However, the
accuracy of such a risk estimator for large portfolios is largely unknown, and
a simple inequality in the previous literature gives an infeasible upper bound
for the estimation error. In addition, numerical studies illustrate that this
upper bound is very crude. In this paper, we propose factor-based risk
estimators under a large amount of assets, and introduce a high-confidence
level upper bound (H-CLUB) to assess the accuracy of the risk estimation. The
H-CLUB is constructed based on three different estimates of the volatility
matrix: sample covariance, approximate factor model with known factors, and
unknown factors (POET, Fan, Liao and Mincheva, 2013). For the first time in the
literature, we derive the limiting distribution of the estimated risks in high
dimensionality. Our numerical results demonstrate that the proposed upper
bounds significantly outperform the traditional crude bounds, and provide
insightful assessment of the estimation of the portfolio risks. In addition,
our simulated results quantify the relative error in the risk estimation, which
is usually negligible using 3-month daily data. Finally, the proposed methods
are applied to an empirical study
FarmTest: Factor-Adjusted Robust Multiple Testing with Approximate False Discovery Control
Large-scale multiple testing with correlated and heavy-tailed data arises in
a wide range of research areas from genomics, medical imaging to finance.
Conventional methods for estimating the false discovery proportion (FDP) often
ignore the effect of heavy-tailedness and the dependence structure among test
statistics, and thus may lead to inefficient or even inconsistent estimation.
Also, the commonly imposed joint normality assumption is arguably too stringent
for many applications. To address these challenges, in this paper we propose a
Factor-Adjusted Robust Multiple Testing (FarmTest) procedure for large-scale
simultaneous inference with control of the false discovery proportion. We
demonstrate that robust factor adjustments are extremely important in both
controlling the FDP and improving the power. We identify general conditions
under which the proposed method produces consistent estimate of the FDP. As a
byproduct that is of independent interest, we establish an exponential-type
deviation inequality for a robust -type covariance estimator under the
spectral norm. Extensive numerical experiments demonstrate the advantage of the
proposed method over several state-of-the-art methods especially when the data
are generated from heavy-tailed distributions. The proposed procedures are
implemented in the R-package FarmTest.Comment: 52 pages, 9 figure
Endogeneity in high dimensions
Most papers on high-dimensional statistics are based on the assumption that
none of the regressors are correlated with the regression error, namely, they
are exogenous. Yet, endogeneity can arise incidentally from a large pool of
regressors in a high-dimensional regression. This causes the inconsistency of
the penalized least-squares method and possible false scientific discoveries. A
necessary condition for model selection consistency of a general class of
penalized regression methods is given, which allows us to prove formally the
inconsistency claim. To cope with the incidental endogeneity, we construct a
novel penalized focused generalized method of moments (FGMM) criterion
function. The FGMM effectively achieves the dimension reduction and applies the
instrumental variable methods. We show that it possesses the oracle property
even in the presence of endogenous predictors, and that the solution is also
near global minimum under the over-identification assumption. Finally, we also
show how the semi-parametric efficiency of estimation can be achieved via a
two-step approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1202 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Endogeneity in ultrahigh dimension
Most papers on high-dimensional statistics are based on the assumption that none of the regressors are correlated with the regression error, namely, they are exogenous. Yet, endogeneity arises easily in high-dimensional regression due to a large pool of regressors and this causes the inconsistency of the penalized least-squares methods and possible false scientic discoveries. A necessary condition for model selection of a very general class of penalized regression methods is given, which allows us to prove
formally the inconsistency claim. To cope with the possible endogeneity, we construct a novel penalized focussed generalized method of moments (FGMM) criterion function
and oer a new optimization algorithm. The FGMM is not a smooth function. To establish its asymptotic properties, we rst study the model selection consistency and an oracle property for a general class of penalized regression methods. These results are then used to show that the FGMM possesses an oracle property even in the presence of endogenous predictors, and that the solution is also near global minimum under the over-identication assumption. Finally, we also show how the semi-parametric efficiency of estimation can be achieved via a two-step approach
Endogeneity in ultrahigh dimension
Most papers on high-dimensional statistics are based on the assumption that none of the regressors are correlated with the regression error, namely, they are exogenous. Yet, endogeneity arises easily in high-dimensional regression due to a large pool of regressors and this causes the inconsistency of the penalized least-squares methods and possible false scientic discoveries. A necessary condition for model selection of a very general class of penalized regression methods is given, which allows us to prove
formally the inconsistency claim. To cope with the possible endogeneity, we construct a novel penalized focussed generalized method of moments (FGMM) criterion function
and oer a new optimization algorithm. The FGMM is not a smooth function. To establish its asymptotic properties, we rst study the model selection consistency and an oracle property for a general class of penalized regression methods. These results are then used to show that the FGMM possesses an oracle property even in the presence of endogenous predictors, and that the solution is also near global minimum under the over-identication assumption. Finally, we also show how the semi-parametric efficiency of estimation can be achieved via a two-step approach
- …