17,330 research outputs found

    Efficient Estimation of Approximate Factor Models via Regularized Maximum Likelihood

    Get PDF
    We study the estimation of a high dimensional approximate factor model in the presence of both cross sectional dependence and heteroskedasticity. The classical method of principal components analysis (PCA) does not efficiently estimate the factor loadings or common factors because it essentially treats the idiosyncratic error to be homoskedastic and cross sectionally uncorrelated. For efficient estimation it is essential to estimate a large error covariance matrix. We assume the model to be conditionally sparse, and propose two approaches to estimating the common factors and factor loadings; both are based on maximizing a Gaussian quasi-likelihood and involve regularizing a large covariance sparse matrix. In the first approach the factor loadings and the error covariance are estimated separately while in the second approach they are estimated jointly. Extensive asymptotic analysis has been carried out. In particular, we develop the inferential theory for the two-step estimation. Because the proposed approaches take into account the large error covariance matrix, they produce more efficient estimators than the classical PCA methods or methods based on a strict factor model

    Statistical Inferences Using Large Estimated Covariances for Panel Data and Factor Models

    Full text link
    While most of the convergence results in the literature on high dimensional covariance matrix are concerned about the accuracy of estimating the covariance matrix (and precision matrix), relatively less is known about the effect of estimating large covariances on statistical inferences. We study two important models: factor analysis and panel data model with interactive effects, and focus on the statistical inference and estimation efficiency of structural parameters based on large covariance estimators. For efficient estimation, both models call for a weighted principle components (WPC), which relies on a high dimensional weight matrix. This paper derives an efficient and feasible WPC using the covariance matrix estimator of Fan et al. (2013). However, we demonstrate that existing results on large covariance estimation based on absolute convergence are not suitable for statistical inferences of the structural parameters. What is needed is some weighted consistency and the associated rate of convergence, which are obtained in this paper. Finally, the proposed method is applied to the US divorce rate data. We find that the efficient WPC identifies the significant effects of divorce-law reforms on the divorce rate, and it provides more accurate estimation and tighter confidence intervals than existing methods

    Large Covariance Estimation by Thresholding Principal Orthogonal Complements

    Full text link
    This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented

    Risks of Large Portfolios

    Get PDF
    Estimating and assessing the risk of a large portfolio is an important topic in financial econometrics and risk management. The risk is often estimated by a substitution of a good estimator of the volatility matrix. However, the accuracy of such a risk estimator for large portfolios is largely unknown, and a simple inequality in the previous literature gives an infeasible upper bound for the estimation error. In addition, numerical studies illustrate that this upper bound is very crude. In this paper, we propose factor-based risk estimators under a large amount of assets, and introduce a high-confidence level upper bound (H-CLUB) to assess the accuracy of the risk estimation. The H-CLUB is constructed based on three different estimates of the volatility matrix: sample covariance, approximate factor model with known factors, and unknown factors (POET, Fan, Liao and Mincheva, 2013). For the first time in the literature, we derive the limiting distribution of the estimated risks in high dimensionality. Our numerical results demonstrate that the proposed upper bounds significantly outperform the traditional crude bounds, and provide insightful assessment of the estimation of the portfolio risks. In addition, our simulated results quantify the relative error in the risk estimation, which is usually negligible using 3-month daily data. Finally, the proposed methods are applied to an empirical study

    Projected principal component analysis in factor models

    Full text link
    This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employs principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise components. We show that the unobserved latent factors can be more accurately estimated than the conventional PCA if the projection is genuine, or more precisely, when the factor loading matrices are related to the projected linear space. When the dimensionality is large, the factors can be estimated accurately even when the sample size is finite. We propose a flexible semiparametric factor model, which decomposes the factor loading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component. The covariates' effects on the factor loadings are further modeled by the additive model via sieve approximations. By using the newly proposed Projected-PCA, the rates of convergence of the smooth factor loading matrices are obtained, which are much faster than those of the conventional factor analysis. The convergence is achieved even when the sample size is finite and is particularly appealing in the high-dimension-low-sample-size situation. This leads us to developing nonparametric tests on whether observed covariates have explaining powers on the loadings and whether they fully explain the loadings. The proposed method is illustrated by both simulated data and the returns of the components of the S&P 500 index.Comment: Published at http://dx.doi.org/10.1214/15-AOS1364 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A lava attack on the recovery of sums of dense and sparse signals

    Full text link
    Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of non-zero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small non-zero parameters. We consider a generalization of these two basic models, termed here a "sparse+dense" model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation. We propose a new penalization-based method, called lava, which is computationally efficient. With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide an deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein's unbiased estimator for lava's prediction risk. A simulation example compares the performance of lava to lasso, ridge, and elastic net in a regression example using feasible, data-dependent penalty parameters and illustrates lava's improved performance relative to these benchmarks
    • …
    corecore