1,984 research outputs found

    Randomized Dimension Reduction on Massive Data

    Full text link
    Scalability of statistical estimators is of increasing importance in modern applications and dimension reduction is often used to extract relevant information from data. A variety of popular dimension reduction approaches can be framed as symmetric generalized eigendecomposition problems. In this paper we outline how taking into account the low rank structure assumption implicit in these dimension reduction approaches provides both computational and statistical advantages. We adapt recent randomized low-rank approximation algorithms to provide efficient solutions to three dimension reduction methods: Principal Component Analysis (PCA), Sliced Inverse Regression (SIR), and Localized Sliced Inverse Regression (LSIR). A key observation in this paper is that randomization serves a dual role, improving both computational and statistical performance. This point is highlighted in our experiments on real and simulated data.Comment: 31 pages, 6 figures, Key Words:dimension reduction, generalized eigendecompositon, low-rank, supervised, inverse regression, random projections, randomized algorithms, Krylov subspace method

    Contour projected dimension reduction

    Full text link
    In regression analysis, we employ contour projection (CP) to develop a new dimension reduction theory. Accordingly, we introduce the notions of the central contour subspace and generalized contour subspace. We show that both of their structural dimensions are no larger than that of the central subspace Cook [Regression Graphics (1998b) Wiley]. Furthermore, we employ CP-sliced inverse regression, CP-sliced average variance estimation and CP-directional regression to estimate the generalized contour subspace, and we subsequently obtain their theoretical properties. Monte Carlo studies demonstrate that the three CP-based dimension reduction methods outperform their corresponding non-CP approaches when the predictors have heavy-tailed elliptical distributions. An empirical example is also presented to illustrate the usefulness of the CP method.Comment: Published in at http://dx.doi.org/10.1214/08-AOS679 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating sufficient reductions of the predictors in abundant high-dimensional regressions

    Get PDF
    We study the asymptotic behavior of a class of methods for sufficient dimension reduction in high-dimension regressions, as the sample size and number of predictors grow in various alignments. It is demonstrated that these methods are consistent in a variety of settings, particularly in abundant regressions where most predictors contribute some information on the response, and oracle rates are possible. Simulation results are presented to support the theoretical conclusion.Comment: Published in at http://dx.doi.org/10.1214/11-AOS962 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sparse group sufficient dimension reduction and covariance cumulative slicing estimation

    Get PDF
    This dissertation contains two main parts: In Part One, for regression problems with grouped covariates, we adopt the idea of sparse group lasso (Friedman et al., 2010) to the framework of the sufficient dimension reduction. We propose a method called the sparse group sufficient dimension reduction (sgSDR) to conduct group and within group variable selections simultaneously without assuming a specific model structure on the regression function. Simulation studies show that our method is comparable to the sparse group lasso under the regular linear model setting, and outperforms sparse group lasso with higher true positive rates and substantially lower false positive rates when the regression function is nonlinear or (and) the error distributions are non-Gaussian. One immediate application of our method is to the gene pathway data analysis where genes naturally fall into groups (pathways). An analysis of a glioblastoma microarray data is included for illustration of our method. In Part Two, for many-valued or continuous Y , the standard practice of replacing the response Y by a discrete version of Y usually results in the loss of power due to the ignorance of intra-slice information. Most of the existing slicing methods highly reply on the selection of the number of slices h. Zhu et al. (2010) proposed a method called the cumulative slicing estimation (CUME) which avoids the otherwise subjective selection of h. In this dissertation, we revisit CUME from a different perspective to gain more insights, and then refine its performance by incorporating the intra-slice covariances. The resulting new method, which we call the covariance cumulative slicing estimation (COCUM), is comparable to CUME when the predictors are normally distributed, and outperforms CUME when the predictors are non-Gaussian, especially in the existence of outliers. The asymptotic results of COCUM are also well proved. --Abstract, page iv

    New developments of dimension reduction

    Get PDF
    Variable selection becomes more crucial than before, since high dimensional data are frequently seen in many research areas. Many model-based variable selection methods have been developed. However, the performance might be poor when the model is mis-specified. Sufficient dimension reduction (SDR, Li 1991; Cook 1998) provides a general framework for model-free variable selection methods. In this thesis, we first propose a novel model-free variable selection method to deal with multi-population data by incorporating the grouping information. Theoretical properties of our proposed method are also presented. Simulation studies show that our new method significantly improves the selection performance compared with those ignoring the grouping information. In the second part of this dissertation, we apply partial SDR method to conduct conditional model-free variable (feature) screening for ultra-high dimensional data, when researchers have prior information regarding the importance of certain predictors based on experience or previous investigations. Comparing to the state of art conditional screening method, conditional sure independence screening (CSIS; Barut, Fan and Verhasselt, 2016), our method greatly outperforms CSIS for nonlinear models. The sure screening consistency property of our proposed method is also established --Abstract, page iv
    • …
    corecore