3,238 research outputs found
Spectrum Estimation: A Unified Framework for Covariance Matrix Estimation and PCA in Large Dimensions
Covariance matrix estimation and principal component analysis (PCA) are two
cornerstones of multivariate analysis. Classic textbook solutions perform
poorly when the dimension of the data is of a magnitude similar to the sample
size, or even larger. In such settings, there is a common remedy for both
statistical problems: nonlinear shrinkage of the eigenvalues of the sample
covariance matrix. The optimal nonlinear shrinkage formula depends on unknown
population quantities and is thus not available. It is, however, possible to
consistently estimate an oracle nonlinear shrinkage, which is motivated on
asymptotic grounds. A key tool to this end is consistent estimation of the set
of eigenvalues of the population covariance matrix (also known as the
spectrum), an interesting and challenging problem in its own right. Extensive
Monte Carlo simulations demonstrate that our methods have desirable
finite-sample properties and outperform previous proposals.Comment: 40 pages, 8 figures, 5 tables, University of Zurich, Department of
Economics, Working Paper No. 105, Revised version, July 201
Inference in Linear Regression Models with Many Covariates and Heteroskedasticity
The linear regression model is widely used in empirical work in Economics,
Statistics, and many other disciplines. Researchers often include many
covariates in their linear model specification in an attempt to control for
confounders. We give inference methods that allow for many covariates and
heteroskedasticity. Our results are obtained using high-dimensional
approximations, where the number of included covariates are allowed to grow as
fast as the sample size. We find that all of the usual versions of Eicker-White
heteroskedasticity consistent standard error estimators for linear models are
inconsistent under this asymptotics. We then propose a new heteroskedasticity
consistent standard error formula that is fully automatic and robust to both
(conditional)\ heteroskedasticity of unknown form and the inclusion of possibly
many covariates. We apply our findings to three settings: parametric linear
models with many covariates, linear panel models with many fixed effects, and
semiparametric semi-linear models with many technical regressors. Simulation
evidence consistent with our theoretical results is also provided. The proposed
methods are also illustrated with an empirical application
Distributed linear regression by averaging
Distributed statistical learning problems arise commonly when dealing with
large datasets. In this setup, datasets are partitioned over machines, which
compute locally, and communicate short messages. Communication is often the
bottleneck. In this paper, we study one-step and iterative weighted parameter
averaging in statistical linear models under data parallelism. We do linear
regression on each machine, send the results to a central server, and take a
weighted average of the parameters. Optionally, we iterate, sending back the
weighted average and doing local ridge regressions centered at it. How does
this work compared to doing linear regression on the full data? Here we study
the performance loss in estimation, test error, and confidence interval length
in high dimensions, where the number of parameters is comparable to the
training data size. We find the performance loss in one-step weighted
averaging, and also give results for iterative averaging. We also find that
different problems are affected differently by the distributed framework.
Estimation error and confidence interval length increase a lot, while
prediction error increases much less. We rely on recent results from random
matrix theory, where we develop a new calculus of deterministic equivalents as
a tool of broader interest.Comment: V2 adds a new section on iterative averaging methods, adds
applications of the calculus of deterministic equivalents, and reorganizes
the pape
Advances in forecast evaluation
This paper surveys recent developments in the evaluation of point forecasts. Taking Westâs (2006) survey as a starting point, we briefly cover the state of the literature as of the time of Westâs writing. We then focus on recent developments, including advancements in the evaluation of forecasts at the population level (based on true, unknown model coefficients), the evaluation of forecasts in the finite sample (based on estimated model coefficients), and the evaluation of conditional versus unconditional forecasts. We present original results in a few subject areas: the optimization of power in determining the split of a sample into in-sample and out-of-sample portions; whether the accuracy of inference in evaluation of multistep forecasts can be improved with the judicious choice of HAC estimator (it can); and the extension of Westâs (1996) theory results for population-level, unconditional forecast evaluation to the case of conditional forecast evaluation.Forecasting ; Time-series analysis
Advances in forecast evaluation
This paper surveys recent developments in the evaluation of point forecasts. Taking West's (2006) survey as a starting point, we briefly cover the state of the literature as of the time of West's writing. We then focus on recent developments, including advancements in the evaluation of forecasts at the population level (based on true, unknown model coefficients), the evaluation of forecasts in the finite sample (based on estimated model coefficients), and the evaluation of conditional versus unconditional forecasts. We present original results in a few subject areas: the optimization of power in determining the split of a sample into in-sample and out-of-sample portions; whether the accuracy of inference in evaluation of multi-step forecasts can be improved with judicious choice of HAC estimator (it can); and the extension of West's (1996) theory results for population-level, unconditional forecast evaluation to the case of conditional forecast evaluation.Forecasting
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
Interpolators -- estimators that achieve zero training error -- have
attracted growing attention in machine learning, mainly because state-of-the
art neural networks appear to be models of this type. In this paper, we study
minimum norm (``ridgeless'') interpolation in high-dimensional least
squares regression. We consider two different models for the feature
distribution: a linear model, where the feature vectors
are obtained by applying a linear transform to a vector of i.i.d.\ entries,
(with ); and a nonlinear model,
where the feature vectors are obtained by passing the input through a random
one-layer neural network, (with ,
a matrix of i.i.d.\ entries, and an
activation function acting componentwise on ). We recover -- in a
precise quantitative way -- several phenomena that have been observed in
large-scale neural networks and kernel machines, including the "double descent"
behavior of the prediction risk, and the potential benefits of
overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version
of earlier results, and results for general coefficient
Nonparametric estimation of scalar diffusions based on low frequency data
We study the problem of estimating the coefficients of a diffusion (X_t,t\geq
0); the estimation is based on discrete data X_{n\Delta},n=0,1,...,N. The
sampling frequency \Delta^{-1} is constant, and asymptotics are taken as the
number N of observations tends to infinity. We prove that the problem of
estimating both the diffusion coefficient (the volatility) and the drift in a
nonparametric setting is ill-posed: the minimax rates of convergence for
Sobolev constraints and squared-error loss coincide with that of a,
respectively, first- and second-order linear inverse problem. To ensure
ergodicity and limit technical difficulties we restrict ourselves to scalar
diffusions living on a compact interval with reflecting boundary conditions.
Our approach is based on the spectral analysis of the associated Markov
semigroup. A rate-optimal estimation of the coefficients is obtained via the
nonparametric estimation of an eigenvalue-eigenfunction pair of the transition
operator of the discrete time Markov chain (X_{n\Delta},n=0,1,...,N) in a
suitable Sobolev norm, together with an estimation of its invariant density.Comment: Published at http://dx.doi.org/10.1214/009053604000000797 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Estimating Euler equations
In this paper we consider conditions under which the estimation of a log-linearized Euler equation for
consumption yields consistent estimates of preference parameters. When utility is isoelastic and a
sample covering a long time period is available, consistent estimates are obtained from the loglinearized
Euler equation when the innovations to the conditional variance of consumption growth are
uncorrelated with the instruments typically used in estimation.
We perform a Montecarlo experiment, consisting in solving and simulating a simple life cycle model
under uncertainty, and show that in most situations, the estimates obtained from the log-linearized
equation are not systematically biased. This is true even when we introduce heteroscedasticity in the
process generating income.
The only exception is when discount rates are very high (e.g. 47% per year). This problem arises
because consumers are nearly always close to the maximum borrowing limit: the estimation bias is
unrelated to the linearization and estimates using nonlinear GMM are as bad. Across all our situations,
estimation using a log-linearized Euler equation does better than nonlinear GMM despite the absence
of measurement error.
Finally, we plot life cycle profiles for the variance of consumption growth, which, except when the
discount factor is very high, is remarkably flat. This implies that claims that demographic variables in
log-linearized Euler equations capture changes in the variance of consumption growth are unwarranted
- âŠ