5 research outputs found
Correcting for selection bias via cross-validation in the classification of microarray data
There is increasing interest in the use of diagnostic rules based on
microarray data. These rules are formed by considering the expression levels of
thousands of genes in tissue samples taken on patients of known classification
with respect to a number of classes, representing, say, disease status or
treatment strategy. As the final versions of these rules are usually based on a
small subset of the available genes, there is a selection bias that has to be
corrected for in the estimation of the associated error rates. We consider the
problem using cross-validation. In particular, we present explicit formulae
that are useful in explaining the layers of validation that have to be
performed in order to avoid improperly cross-validated estimates.Comment: Published in at http://dx.doi.org/10.1214/193940307000000284 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org