Supervised classifying of biological samples based on genetic information,
(e.g. gene expression profiles) is an important problem in biostatistics. In
order to find both accurate and interpretable classification rules variable
selection is indispensable. This article explores how an assessment of the
individual importance of variables (effect size estimation) can be used to
perform variable selection. I review recent effect size estimation approaches
in the context of linear discriminant analysis (LDA) and propose a new
conceptually simple effect size estimation method which is at the same time
computationally efficient. I then show how to use effect sizes to perform
variable selection based on the misclassification rate which is the data
independent expectation of the prediction error. Simulation studies and real
data analyses illustrate that the proposed effect size estimation and variable
selection methods are competitive. Particularly, they lead to both compact and
interpretable feature sets.Comment: 21 pages, 2 figure

Klaus, Bernd

English

arXiv

Abstract: Supervised classifying of biological samples based on genetic information, (e.g., gene expression profiles) is an important problem in biostatistics. In order to find both accurate and interpretable classification rules variable selection is indispensable. This article explores how an assessment of the individual importance of variables (effect size estimation) can be used to perform variable selection. I review recent effect size estimation approaches in the context of linear discriminant analysis (LDA) and propose a new conceptually simple effect size estimation method which is at the same time computationally efficient. I then show how to use effect sizes to perform variable selection based on the misclassification rate, which is the data independent expectation of the prediction error. Simulation studies and real data analyses illustrate that the proposed effect size estimation and variable selection methods are competitive. Particularly, they lead to both compact and interpretable feature sets. Program files to be used with the statistical software R implementing the variable selection approaches presented in this article are available from my homepage: http://b-klaus.de

Effect Size Estimation and Misclassification Rate Based Variable Selection in Linear Discriminant Analysis

Abstract

Similar works

Full text

Available Versions

CiteSeerX