73,504 research outputs found
Contour regression: A general approach to dimension reduction
We propose a novel approach to sufficient dimension reduction in regression,
based on estimating contour directions of small variation in the response.
These directions span the orthogonal complement of the minimal space relevant
for the regression and can be extracted according to two measures of variation
in the response, leading to simple and general contour regression (SCR and GCR)
methodology. In comparison with existing sufficient dimension reduction
techniques, this contour-based methodology guarantees exhaustive estimation of
the central subspace under ellipticity of the predictor distribution and mild
additional assumptions, while maintaining \sqrtn-consistency and computational
ease. Moreover, it proves robust to departures from ellipticity. We establish
population properties for both SCR and GCR, and asymptotic properties for SCR.
Simulations to compare performance with that of standard techniques such as
ordinary least squares, sliced inverse regression, principal Hessian directions
and sliced average variance estimation confirm the advantages anticipated by
the theoretical analyses. We demonstrate the use of contour-based methods on a
data set concerning soil evaporation.Comment: Published at http://dx.doi.org/10.1214/009053605000000192 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Sufficient Covariate, Propensity Variable and Doubly Robust Estimation
Statistical causal inference from observational studies often requires
adjustment for a possibly multi-dimensional variable, where dimension reduction
is crucial. The propensity score, first introduced by Rosenbaum and Rubin, is a
popular approach to such reduction. We address causal inference within Dawid's
decision-theoretic framework, where it is essential to pay attention to
sufficient covariates and their properties. We examine the role of a propensity
variable in a normal linear model. We investigate both population-based and
sample-based linear regressions, with adjustments for a multivariate covariate
and for a propensity variable. In addition, we study the augmented inverse
probability weighted estimator, involving a combination of a response model and
a propensity model. In a linear regression with homoscedasticity, a propensity
variable is proved to provide the same estimated causal effect as multivariate
adjustment. An estimated propensity variable may, but need not, yield better
precision than the true propensity variable. The augmented inverse probability
weighted estimator is doubly robust and can improve precision if the propensity
model is correctly specified
Dimension Reduction and Variable Selection
High-dimensional data are becoming increasingly available as data collection technology advances. Over the last decade, significant developments have been taking place in high-dimensional data analysis, driven primarily by a wide range of applications in many fields such as genomics, signal processing, and environmental studies. Statistical techniques such as dimension reduction and variable selection play important roles in high dimensional data analysis. Sufficient dimension reduction provides a way to find the reduced space of the original space without a parametric model. This method has been widely applied in many scientific fields such as genetics, brain imaging analysis, econometrics, environmental sciences, etc. in recent years.
In this dissertation, we worked on three projects. The first one combines local modal regression and Minimum Average Variance Estimation (MAVE) to introduce a robust dimension reduction approach. In addition to being robust to outliers or heavy-tailed distribution, our proposed method has the same convergence rate as the original MAVE. Furthermore, we combine local modal base MAVE with a penalty to select informative covariates in a regression setting. This new approach can exhaustively estimate directions in the regression mean function and select informative covariates simultaneously, while being robust to the existence of possible outliers in the dependent variable. The second project develops sparse adaptive MAVE (saMAVE). SaMAVE has advantages over adaptive LASSO because it extends adaptive LASSO to multi-dimensional and nonlinear settings, without any model assumption, and has advantages over sparse inverse dimension reduction methods in that it does not require any particular probability distribution on \textbf{X}. In addition, saMAVE can exhaustively estimate the dimensions in the conditional mean function. The third project extends the envelope method to multivariate spatial data. The envelope technique is a new version of the classical multivariate linear model. The estimator from envelope asymptotically has less variation compare to the Maximum Likelihood Estimator (MLE). The current envelope methodology is for independent observations. While the assumption of independence is convenient, this does not address the additional complication associated with a spatial correlation. This work extends the idea of the envelope method to cases where independence is an unreasonable assumption, specifically multivariate data from spatially correlated process. This novel approach provides estimates for the parameters of interest with smaller variance compared to maximum likelihood estimator while still being able to capture the spatial structure in the data
Kernel Sliced Inverse Regression: Regularization and Consistency
Kernel sliced inverse regression (KSIR) is a natural framework for nonlinear dimension reduction using the mapping induced by kernels. However, there are numeric, algorithmic, and conceptual subtleties in making the method robust and consistent. We apply two types of regularization in this framework to address computational stability and generalization performance. We also provide an interpretation of the algorithm and prove consistency. The utility of this approach is illustrated on simulated and real data
Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions
Head-pose estimation has many applications, such as social event analysis,
human-robot and human-computer interaction, driving assistance, and so forth.
Head-pose estimation is challenging because it must cope with changing
illumination conditions, variabilities in face orientation and in appearance,
partial occlusions of facial landmarks, as well as bounding-box-to-face
alignment errors. We propose tu use a mixture of linear regressions with
partially-latent output. This regression method learns to map high-dimensional
feature vectors (extracted from bounding boxes of faces) onto the joint space
of head-pose angles and bounding-box shifts, such that they are robustly
predicted in the presence of unobservable phenomena. We describe in detail the
mapping method that combines the merits of unsupervised manifold learning
techniques and of mixtures of regressions. We validate our method with three
publicly available datasets and we thoroughly benchmark four variants of the
proposed algorithm with several state-of-the-art head-pose estimation methods.Comment: 12 pages, 5 figures, 3 table
Multinomial Inverse Regression for Text Analysis
Text data, including speeches, stories, and other document forms, are often
connected to sentiment variables that are of interest for research in
marketing, economics, and elsewhere. It is also very high dimensional and
difficult to incorporate into statistical analyses. This article introduces a
straightforward framework of sentiment-preserving dimension reduction for text
data. Multinomial inverse regression is introduced as a general tool for
simplifying predictor sets that can be represented as draws from a multinomial
distribution, and we show that logistic regression of phrase counts onto
document annotations can be used to obtain low dimension document
representations that are rich in sentiment information. To facilitate this
modeling, a novel estimation technique is developed for multinomial logistic
regression with very high-dimension response. In particular, independent
Laplace priors with unknown variance are assigned to each regression
coefficient, and we detail an efficient routine for maximization of the joint
posterior over coefficients and their prior scale. This "gamma-lasso" scheme
yields stable and effective estimation for general high-dimension logistic
regression, and we argue that it will be superior to current methods in many
settings. Guidelines for prior specification are provided, algorithm
convergence is detailed, and estimator properties are outlined from the
perspective of the literature on non-concave likelihood penalization. Related
work on sentiment analysis from statistics, econometrics, and machine learning
is surveyed and connected. Finally, the methods are applied in two detailed
examples and we provide out-of-sample prediction studies to illustrate their
effectiveness.Comment: Published in the Journal of the American Statistical Association 108,
2013, with discussion (rejoinder is here: http://arxiv.org/abs/1304.4200).
Software is available in the textir package for
- …