Sufficient Dimension Reduction

Abstract

In regression analysis, it is difficult to uncover the dependence relationship between a response variable and a covariate vector when the dimension of the covariate vector is high. To reduce the dimension of the covariate vector, one approach is sufficient dimension reduction. Sufficient dimension reduction is based on the assumption that the response variable relates to only a few linear combinations of the covariate vector. Thus, by replacing the covariate vector with these linear combinations, sufficient dimension reduction achieves dimension reduction. The goal of sufficient dimension reduction is to estimate the space spanned by these linear combinations of the covariate vector. We denote this space by S. In this thesis, we give an introductory review on three important sufficient dimension reduction methods. They are Sliced Inverse Regression (SIR), Sliced Average Variance Estimate (SAVE) and Principle Hessian Directions (pHd). Li proposed SIR in 1991. SIR is a method that exploits the simplicity of the inverse regression. Given the univariate response variable and the high dimensional covariate, it is much easier to regress the covariate against the response variable than the other way around. Motivated by a theorem that connects forward regression and inverse regression, SIR estimates S using inverse regression lines. Since SIR uses first moments only, it fails when there exists symmetry dependence between the response variable and the covariate. To make up for this defect, Cook proposed SAVE in a comment on SIR in 1991. SAVE follows the general lines of SIR but uses second moments as well as first moments to estimate S. pHd is also a second moment method. Li developed pHd in 1992 based on the observation that the eigenvectors for the Hessian matrices of the regression function are closely related to the basis vectors of S. Therefore pHd provides an estimate of S by using these eigenvectors. To compare these methods, a simulation study is presented at the end. From the simulation results, SIR is the most efficient method and SAVE is the most time consuming method. Since SIR fails when symmetry dependence exists, we recommend pHd when symmetry dependence presents and SIR in other cases

    Similar works