6 research outputs found

    IPAD: Stable Interpretable Forecasting with Knockoffs Inference

    Get PDF
    Interpretability and stability are two important features that are desired in many contemporary big data applications arising in economics and finance. While the former is enjoyed to some extent by many existing forecasting approaches, the latter in the sense of controlling the fraction of wrongly discovered features which can enhance greatly the interpretability is still largely underdeveloped in the econometric settings. To this end, in this paper we exploit the general framework of model-X knockoffs introduced recently in Cand\`{e}s, Fan, Janson and Lv (2018), which is nonconventional for reproducible large-scale inference in that the framework is completely free of the use of p-values for significance testing, and suggest a new method of intertwined probabilistic factors decoupling (IPAD) for stable interpretable forecasting with knockoffs inference in high-dimensional models. The recipe of the method is constructing the knockoff variables by assuming a latent factor model that is exploited widely in economics and finance for the association structure of covariates. Our method and work are distinct from the existing literature in that we estimate the covariate distribution from data instead of assuming that it is known when constructing the knockoff variables, our procedure does not require any sample splitting, we provide theoretical justifications on the asymptotic false discovery rate control, and the theory for the power analysis is also established. Several simulation examples and the real data analysis further demonstrate that the newly suggested method has appealing finite-sample performance with desired interpretability and stability compared to some popularly used forecasting methods

    Variable Selection in Linear Regressions with Many Highly Correlated Covariates

    No full text
    This paper is concerned with variable selection in linear high-dimensional framework when the set of covariates under consideration are highly correlated. Existing methods in the literature generally require that the degree of correlation among covariates to be weak, yet, often in applied research, covariates could be strongly cross correlated due to common factors. This paper generalizes the One Covariate at a Time Multiple Testing procedure proposed by Chudik et al. (2018) to allow the set of covariates under consideration to be highly correlated. We exploit ideas from latent factor and multiple testing literature to control the probability of selecting the approximating model. We also establish the asymptotic behavior of the post GOCMT selected model estimated by the least squares method. Our results show that the estimation error of the coefficients converge to zero at the limit. Moreover, the mean square error and the mean square forecast error of the estimated model approaches to their corresponding optimal values asymptotically. The proposed method is shown to be valid under general assumptions and is computationally very fast. Monte Carlo experiments indicate that the newly suggested method have appealing finite-sample performance relative to competing methods under many different settings. The benefits of the proposed method are also illustrated by an empirical application to selection of risk factors in asset pricing literature

    Optimal Invariant Tests in an Instrumental Variables Regression With Heteroskedastic and Autocorrelated Errors

    No full text
    This paper uses model symmetries in the instrumental variable (IV) regression to derive an invariant test for the causal structural parameter. Contrary to popular belief, we show that there exist model symmetries when equation errors are heteroskedastic and autocorrelated (HAC). Our theory is consistent with existing results for the homoskedastic model (Andrews, Moreira, and Stock (2006) and Chamberlain (2007)). We use these symmetries to propose the conditional integrated likelihood (CIL) test for the causality parameter in the over-identified model. Theoretical and numerical findings show that the CIL test performs well compared to other tests in terms of power and implementation. We recommend that practitioners use the Anderson-Rubin (AR) test in the just-identified model, and the CIL test in the over-identified mode

    Variable Selection and Forecasting in High Dimensional Linear Regressions with Parameter Instability

    No full text
    This paper is concerned with the problem of variable selection and forecasting in the presence of parameter instability. There are a number of approaches proposed for forecasting in the presence of time-varying parameters, including the use of rolling windows and exponential down-weighting. However, these studies start with a given model specification and do not consider the problem of variable selection, which is complicated by time variations in the effects of signals on target variables. In this study we investigate whether or not we should use weighted observations at the vari- able selection stage in the presence of parameter instability, particularly when the number of potential covariates is large. Amongst the extant variable selection approaches we focus on the recently developed One Covariate at a time Multiple Testing (OCMT) method. This procedure allows a natural distinction between the selection and forecasting stages. We establish three main theorems on selection, estimation post selection, and in-sample fit. These theorems provide justification for using the full (not down-weighted) sample at the selection stage of OCMT and down-weighting of observations only at the forecasting stage (if needed). The benefits of the proposed method are illustrated by empirical applications to forecasting monthly stock market returns and quarterly output growths
    corecore