317 research outputs found
Augmented sparse principal component analysis for high dimensional data
We study the problem of estimating the leading eigenvectors of a
high-dimensional population covariance matrix based on independent Gaussian
observations. We establish lower bounds on the rates of convergence of the
estimators of the leading eigenvectors under -sparsity constraints when an
loss function is used. We also propose an estimator of the leading
eigenvectors based on a coordinate selection scheme combined with PCA and show
that the proposed estimator achieves the optimal rate of convergence under a
sparsity regime. Moreover, we establish that under certain scenarios, the usual
PCA achieves the minimax convergence rate.Comment: This manuscript was written in 2007, and a version has been available
on the first author's website, but it is posted to arXiv now in its 2007
form. Revisions incorporating later work will be posted separatel
Nonparametric estimation of dynamics of monotone trajectories
We study a class of nonlinear nonparametric inverse problems. Specifically,
we propose a nonparametric estimator of the dynamics of a monotonically
increasing trajectory defined on a finite time interval. Under suitable
regularity conditions, we prove consistency of the proposed estimator and show
that in terms of -loss, the optimal rate of convergence for the proposed
estimator is the same as that for the estimation of the derivative of a
trajectory. This is a new contribution to the area of nonlinear nonparametric
inverse problems. We conduct a simulation study to examine the finite sample
behavior of the proposed estimator and apply it to the Berkeley growth data
Spectral analysis of linear time series in moderately high dimensions
This article is concerned with the spectral behavior of -dimensional
linear processes in the moderately high-dimensional case when both
dimensionality and sample size tend to infinity so that . It
is shown that, under an appropriate set of assumptions, the empirical spectral
distributions of the renormalized and symmetrized sample autocovariance
matrices converge almost surely to a nonrandom limit distribution supported on
the real line. The key assumption is that the linear process is driven by a
sequence of -dimensional real or complex random vectors with i.i.d. entries
possessing zero mean, unit variance and finite fourth moments, and that the
linear process coefficient matrices are Hermitian and
simultaneously diagonalizable. Several relaxations of these assumptions are
discussed. The results put forth in this paper can help facilitate inference on
model parameters, model diagnostics and prediction of future values of the
linear process
On the Mar\v{c}enko-Pastur law for linear time series
This paper is concerned with extensions of the classical Mar\v{c}enko-Pastur
law to time series. Specifically, -dimensional linear processes are
considered which are built from innovation vectors with independent,
identically distributed (real- or complex-valued) entries possessing zero mean,
unit variance and finite fourth moments. The coefficient matrices of the linear
process are assumed to be simultaneously diagonalizable. In this setting, the
limiting behavior of the empirical spectral distribution of both sample
covariance and symmetrized sample autocovariance matrices is determined in the
high-dimensional setting for which dimension and
sample size diverge to infinity at the same rate. The results extend
existing contributions available in the literature for the covariance case and
are one of the first of their kind for the autocovariance case.Comment: Published at http://dx.doi.org/10.1214/14-AOS1294 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
"Pre-conditioning" for feature selection and regression in high-dimensional problems
We consider regression problems where the number of predictors greatly
exceeds the number of observations. We propose a method for variable selection
that first estimates the regression function, yielding a "pre-conditioned"
response variable. The primary method used for this initial regression is
supervised principal components. Then we apply a standard procedure such as
forward stepwise selection or the LASSO to the pre-conditioned response
variable. In a number of simulated and real data examples, this two-step
procedure outperforms forward stepwise selection or the usual LASSO (applied
directly to the raw outcome). We also show that under a certain Gaussian latent
variable model, application of the LASSO to the pre-conditioned response
variable is consistent as the number of predictors and observations increases.
Moreover, when the observational noise is rather large, the suggested procedure
can give a more accurate estimate than LASSO. We illustrate our method on some
real problems, including survival analysis with microarray data
- …