10,439 research outputs found
A Direct Estimation Approach to Sparse Linear Discriminant Analysis
This paper considers sparse linear discriminant analysis of high-dimensional
data. In contrast to the existing methods which are based on separate
estimation of the precision matrix \O and the difference \de of the mean
vectors, we introduce a simple and effective classifier by estimating the
product \O\de directly through constrained minimization. The
estimator can be implemented efficiently using linear programming and the
resulting classifier is called the linear programming discriminant (LPD) rule.
The LPD rule is shown to have desirable theoretical and numerical properties.
It exploits the approximate sparsity of \O\de and as a consequence allows
cases where it can still perform well even when \O and/or \de cannot be
estimated consistently. Asymptotic properties of the LPD rule are investigated
and consistency and rate of convergence results are given. The LPD classifier
has superior finite sample performance and significant computational advantages
over the existing methods that require separate estimation of \O and \de.
The LPD rule is also applied to analyze real datasets from lung cancer and
leukemia studies. The classifier performs favorably in comparison to existing
methods.Comment: 39 pages.To appear in Journal of the American Statistical Associatio
A nonparametric empirical Bayes approach to covariance matrix estimation
We propose an empirical Bayes method to estimate high-dimensional covariance
matrices. Our procedure centers on vectorizing the covariance matrix and
treating matrix estimation as a vector estimation problem. Drawing from the
compound decision theory literature, we introduce a new class of decision rules
that generalizes several existing procedures. We then use a nonparametric
empirical Bayes g-modeling approach to estimate the oracle optimal rule in that
class. This allows us to let the data itself determine how best to shrink the
estimator, rather than shrinking in a pre-determined direction such as toward a
diagonal matrix. Simulation results and a gene expression network analysis
shows that our approach can outperform a number of state-of-the-art proposals
in a wide range of settings, sometimes substantially.Comment: 20 pages, 4 figure
A model-based multithreshold method for subgroup identification
Thresholding variable plays a crucial role in subgroup identification for personalizedmedicine. Most existing partitioning methods split the sample basedon one predictor variable. In this paper, we consider setting the splitting rulefrom a combination of multivariate predictors, such as the latent factors, principlecomponents, and weighted sum of predictors. Such a subgrouping methodmay lead to more meaningful partitioning of the population than using a singlevariable. In addition, our method is based on a change point regression modeland thus yields straight forward model-based prediction results. After choosinga particular thresholding variable form, we apply a two-stage multiple changepoint detection method to determine the subgroups and estimate the regressionparameters. We show that our approach can produce two or more subgroupsfrom the multiple change points and identify the true grouping with high probability.In addition, our estimation results enjoy oracle properties. We design asimulation study to compare performances of our proposed and existing methodsand apply them to analyze data sets from a Scleroderma trial and a breastcancer study
- …