1,984 research outputs found
Randomized Dimension Reduction on Massive Data
Scalability of statistical estimators is of increasing importance in modern
applications and dimension reduction is often used to extract relevant
information from data. A variety of popular dimension reduction approaches can
be framed as symmetric generalized eigendecomposition problems. In this paper
we outline how taking into account the low rank structure assumption implicit
in these dimension reduction approaches provides both computational and
statistical advantages. We adapt recent randomized low-rank approximation
algorithms to provide efficient solutions to three dimension reduction methods:
Principal Component Analysis (PCA), Sliced Inverse Regression (SIR), and
Localized Sliced Inverse Regression (LSIR). A key observation in this paper is
that randomization serves a dual role, improving both computational and
statistical performance. This point is highlighted in our experiments on real
and simulated data.Comment: 31 pages, 6 figures, Key Words:dimension reduction, generalized
eigendecompositon, low-rank, supervised, inverse regression, random
projections, randomized algorithms, Krylov subspace method
Contour projected dimension reduction
In regression analysis, we employ contour projection (CP) to develop a new
dimension reduction theory. Accordingly, we introduce the notions of the
central contour subspace and generalized contour subspace. We show that both of
their structural dimensions are no larger than that of the central subspace
Cook [Regression Graphics (1998b) Wiley]. Furthermore, we employ CP-sliced
inverse regression, CP-sliced average variance estimation and CP-directional
regression to estimate the generalized contour subspace, and we subsequently
obtain their theoretical properties. Monte Carlo studies demonstrate that the
three CP-based dimension reduction methods outperform their corresponding
non-CP approaches when the predictors have heavy-tailed elliptical
distributions. An empirical example is also presented to illustrate the
usefulness of the CP method.Comment: Published in at http://dx.doi.org/10.1214/08-AOS679 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Estimating sufficient reductions of the predictors in abundant high-dimensional regressions
We study the asymptotic behavior of a class of methods for sufficient
dimension reduction in high-dimension regressions, as the sample size and
number of predictors grow in various alignments. It is demonstrated that these
methods are consistent in a variety of settings, particularly in abundant
regressions where most predictors contribute some information on the response,
and oracle rates are possible. Simulation results are presented to support the
theoretical conclusion.Comment: Published in at http://dx.doi.org/10.1214/11-AOS962 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Sparse group sufficient dimension reduction and covariance cumulative slicing estimation
This dissertation contains two main parts: In Part One, for regression problems with grouped covariates, we adopt the idea of sparse group lasso (Friedman et al., 2010) to the framework of the sufficient dimension reduction. We propose a method called the sparse group sufficient dimension reduction (sgSDR) to conduct group and within group variable selections simultaneously without assuming a specific model structure on the regression function. Simulation studies show that our method is comparable to the sparse group lasso under the regular linear model setting, and outperforms sparse group lasso with higher true positive rates and substantially lower false positive rates when the regression function is nonlinear or (and) the error distributions are non-Gaussian. One immediate application of our method is to the gene pathway data analysis where genes naturally fall into groups (pathways). An analysis of a glioblastoma microarray data is included for illustration of our method. In Part Two, for many-valued or continuous Y , the standard practice of replacing the response Y by a discrete version of Y usually results in the loss of power due to the ignorance of intra-slice information. Most of the existing slicing methods highly reply on the selection of the number of slices h. Zhu et al. (2010) proposed a method called the cumulative slicing estimation (CUME) which avoids the otherwise subjective selection of h. In this dissertation, we revisit CUME from a different perspective to gain more insights, and then refine its performance by incorporating the intra-slice covariances. The resulting new method, which we call the covariance cumulative slicing estimation (COCUM), is comparable to CUME when the predictors are normally distributed, and outperforms CUME when the predictors are non-Gaussian, especially in the existence of outliers. The asymptotic results of COCUM are also well proved. --Abstract, page iv
New developments of dimension reduction
Variable selection becomes more crucial than before, since high dimensional data are frequently seen in many research areas. Many model-based variable selection methods have been developed. However, the performance might be poor when the model is mis-specified. Sufficient dimension reduction (SDR, Li 1991; Cook 1998) provides a general framework for model-free variable selection methods.
In this thesis, we first propose a novel model-free variable selection method to deal with multi-population data by incorporating the grouping information. Theoretical properties of our proposed method are also presented. Simulation studies show that our new method significantly improves the selection performance compared with those ignoring the grouping information. In the second part of this dissertation, we apply partial SDR method to conduct conditional model-free variable (feature) screening for ultra-high dimensional data, when researchers have prior information regarding the importance of certain predictors based on experience or previous investigations. Comparing to the state of art conditional screening method, conditional sure independence screening (CSIS; Barut, Fan and Verhasselt, 2016), our method greatly outperforms CSIS for nonlinear models. The sure screening consistency property of our proposed method is also established --Abstract, page iv
- …