Search CORE

174 research outputs found

Projected principal component analysis in factor models

Author: Fan Jianqing
Liao Yuan
Wang Weichen
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 15/01/2016
Field of study

This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employs principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise components. We show that the unobserved latent factors can be more accurately estimated than the conventional PCA if the projection is genuine, or more precisely, when the factor loading matrices are related to the projected linear space. When the dimensionality is large, the factors can be estimated accurately even when the sample size is finite. We propose a flexible semiparametric factor model, which decomposes the factor loading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component. The covariates' effects on the factor loadings are further modeled by the additive model via sieve approximations. By using the newly proposed Projected-PCA, the rates of convergence of the smooth factor loading matrices are obtained, which are much faster than those of the conventional factor analysis. The convergence is achieved even when the sample size is finite and is particularly appealing in the high-dimension-low-sample-size situation. This leads us to developing nonparametric tests on whether observed covariates have explaining powers on the loadings and whether they fully explain the loadings. The proposed method is illustrated by both simulated data and the returns of the components of the S&P 500 index.Comment: Published at http://dx.doi.org/10.1214/15-AOS1364 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

PubMed Central

Large Covariance Estimation by Thresholding Principal Orthogonal Complements

Author: Fan Jianqing
Liao Yuan
Mincheva Martina
Publication venue
Publication date: 01/01/2013
Field of study

This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

PubMed Central

Risks of Large Portfolios

Author: Fan Jianqing
Liao Yuan
Shi Xiaofeng
Publication venue
Publication date: 01/01/2013
Field of study

Estimating and assessing the risk of a large portfolio is an important topic in financial econometrics and risk management. The risk is often estimated by a substitution of a good estimator of the volatility matrix. However, the accuracy of such a risk estimator for large portfolios is largely unknown, and a simple inequality in the previous literature gives an infeasible upper bound for the estimation error. In addition, numerical studies illustrate that this upper bound is very crude. In this paper, we propose factor-based risk estimators under a large amount of assets, and introduce a high-confidence level upper bound (H-CLUB) to assess the accuracy of the risk estimation. The H-CLUB is constructed based on three different estimates of the volatility matrix: sample covariance, approximate factor model with known factors, and unknown factors (POET, Fan, Liao and Mincheva, 2013). For the first time in the literature, we derive the limiting distribution of the estimated risks in high dimensionality. Our numerical results demonstrate that the proposed upper bounds significantly outperform the traditional crude bounds, and provide insightful assessment of the estimation of the portfolio risks. In addition, our simulated results quantify the relative error in the risk estimation, which is usually negligible using 3-month daily data. Finally, the proposed methods are applied to an empirical study

arXiv.org e-Print Archive

Munich RePEc Personal Archive

Crossref

FarmTest: Factor-Adjusted Robust Multiple Testing with Approximate False Discovery Control

Author: Fan Jianqing
Ke Yuan
Sun Qiang
Zhou Wen-Xin
Publication venue
Publication date: 17/09/2018
Field of study

Large-scale multiple testing with correlated and heavy-tailed data arises in a wide range of research areas from genomics, medical imaging to finance. Conventional methods for estimating the false discovery proportion (FDP) often ignore the effect of heavy-tailedness and the dependence structure among test statistics, and thus may lead to inefficient or even inconsistent estimation. Also, the commonly imposed joint normality assumption is arguably too stringent for many applications. To address these challenges, in this paper we propose a Factor-Adjusted Robust Multiple Testing (FarmTest) procedure for large-scale simultaneous inference with control of the false discovery proportion. We demonstrate that robust factor adjustments are extremely important in both controlling the FDP and improving the power. We identify general conditions under which the proposed method produces consistent estimate of the FDP. As a byproduct that is of independent interest, we establish an exponential-type deviation inequality for a robust

U

-type covariance estimator under the spectral norm. Extensive numerical experiments demonstrate the advantage of the proposed method over several state-of-the-art methods especially when the data are generated from heavy-tailed distributions. The proposed procedures are implemented in the R-package FarmTest.Comment: 52 pages, 9 figure

arXiv.org e-Print Archive

Princeton University Open Access Repository

eScholarship - University of California

FigShare

Endogeneity in high dimensions

Author: Fan Jianqing
Liao Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 27/05/2014
Field of study

Most papers on high-dimensional statistics are based on the assumption that none of the regressors are correlated with the regression error, namely, they are exogenous. Yet, endogeneity can arise incidentally from a large pool of regressors in a high-dimensional regression. This causes the inconsistency of the penalized least-squares method and possible false scientific discoveries. A necessary condition for model selection consistency of a general class of penalized regression methods is given, which allows us to prove formally the inconsistency claim. To cope with the incidental endogeneity, we construct a novel penalized focused generalized method of moments (FGMM) criterion function. The FGMM effectively achieves the dimension reduction and applies the instrumental variable methods. We show that it possesses the oracle property even in the presence of endogenous predictors, and that the solution is also near global minimum under the over-identification assumption. Finally, we also show how the semi-parametric efficiency of estimation can be achieved via a two-step approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1202 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

PubMed Central

Endogeneity in ultrahigh dimension

Author: Fan Jianqing
Liao Yuan
Publication venue
Publication date: 01/01/2012
Field of study

Most papers on high-dimensional statistics are based on the assumption that none of the regressors are correlated with the regression error, namely, they are exogenous. Yet, endogeneity arises easily in high-dimensional regression due to a large pool of regressors and this causes the inconsistency of the penalized least-squares methods and possible false scientic discoveries. A necessary condition for model selection of a very general class of penalized regression methods is given, which allows us to prove formally the inconsistency claim. To cope with the possible endogeneity, we construct a novel penalized focussed generalized method of moments (FGMM) criterion function and oer a new optimization algorithm. The FGMM is not a smooth function. To establish its asymptotic properties, we rst study the model selection consistency and an oracle property for a general class of penalized regression methods. These results are then used to show that the FGMM possesses an oracle property even in the presence of endogenous predictors, and that the solution is also near global minimum under the over-identication assumption. Finally, we also show how the semi-parametric efficiency of estimation can be achieved via a two-step approach

Munich RePEc Personal Archive

Crossref

Endogeneity in ultrahigh dimension

Author: Fan Jianqing
Liao Yuan
Publication venue
Publication date: 01/01/2012
Field of study

Most papers on high-dimensional statistics are based on the assumption that none of the regressors are correlated with the regression error, namely, they are exogenous. Yet, endogeneity arises easily in high-dimensional regression due to a large pool of regressors and this causes the inconsistency of the penalized least-squares methods and possible false scientic discoveries. A necessary condition for model selection of a very general class of penalized regression methods is given, which allows us to prove formally the inconsistency claim. To cope with the possible endogeneity, we construct a novel penalized focussed generalized method of moments (FGMM) criterion function and oer a new optimization algorithm. The FGMM is not a smooth function. To establish its asymptotic properties, we rst study the model selection consistency and an oracle property for a general class of penalized regression methods. These results are then used to show that the FGMM possesses an oracle property even in the presence of endogenous predictors, and that the solution is also near global minimum under the over-identication assumption. Finally, we also show how the semi-parametric efficiency of estimation can be achieved via a two-step approach