317 research outputs found

    Augmented sparse principal component analysis for high dimensional data

    Full text link
    We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish lower bounds on the rates of convergence of the estimators of the leading eigenvectors under lql^q-sparsity constraints when an l2l^2 loss function is used. We also propose an estimator of the leading eigenvectors based on a coordinate selection scheme combined with PCA and show that the proposed estimator achieves the optimal rate of convergence under a sparsity regime. Moreover, we establish that under certain scenarios, the usual PCA achieves the minimax convergence rate.Comment: This manuscript was written in 2007, and a version has been available on the first author's website, but it is posted to arXiv now in its 2007 form. Revisions incorporating later work will be posted separatel

    Nonparametric estimation of dynamics of monotone trajectories

    Full text link
    We study a class of nonlinear nonparametric inverse problems. Specifically, we propose a nonparametric estimator of the dynamics of a monotonically increasing trajectory defined on a finite time interval. Under suitable regularity conditions, we prove consistency of the proposed estimator and show that in terms of L2L^2-loss, the optimal rate of convergence for the proposed estimator is the same as that for the estimation of the derivative of a trajectory. This is a new contribution to the area of nonlinear nonparametric inverse problems. We conduct a simulation study to examine the finite sample behavior of the proposed estimator and apply it to the Berkeley growth data

    Spectral analysis of linear time series in moderately high dimensions

    Full text link
    This article is concerned with the spectral behavior of pp-dimensional linear processes in the moderately high-dimensional case when both dimensionality pp and sample size nn tend to infinity so that p/n→0p/n\to0. It is shown that, under an appropriate set of assumptions, the empirical spectral distributions of the renormalized and symmetrized sample autocovariance matrices converge almost surely to a nonrandom limit distribution supported on the real line. The key assumption is that the linear process is driven by a sequence of pp-dimensional real or complex random vectors with i.i.d. entries possessing zero mean, unit variance and finite fourth moments, and that the p×pp\times p linear process coefficient matrices are Hermitian and simultaneously diagonalizable. Several relaxations of these assumptions are discussed. The results put forth in this paper can help facilitate inference on model parameters, model diagnostics and prediction of future values of the linear process

    On the Mar\v{c}enko-Pastur law for linear time series

    Full text link
    This paper is concerned with extensions of the classical Mar\v{c}enko-Pastur law to time series. Specifically, pp-dimensional linear processes are considered which are built from innovation vectors with independent, identically distributed (real- or complex-valued) entries possessing zero mean, unit variance and finite fourth moments. The coefficient matrices of the linear process are assumed to be simultaneously diagonalizable. In this setting, the limiting behavior of the empirical spectral distribution of both sample covariance and symmetrized sample autocovariance matrices is determined in the high-dimensional setting p/n→c∈(0,∞)p/n\to c\in (0,\infty) for which dimension pp and sample size nn diverge to infinity at the same rate. The results extend existing contributions available in the literature for the covariance case and are one of the first of their kind for the autocovariance case.Comment: Published at http://dx.doi.org/10.1214/14-AOS1294 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    "Pre-conditioning" for feature selection and regression in high-dimensional problems

    Full text link
    We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the pre-conditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the pre-conditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data
    • …
    corecore