Background
With microarray technology becoming mature and popular, the selection and use
of a small number of relevant genes for accurate classification of samples is a
hot topic in the circles of biostatistics and bioinformatics. However, most of
the developed algorithms lack the ability to handle multiple classes, which
arguably a common application. Here, we propose an extension to an existing
regularization algorithm called Threshold Gradient Descent Regularization
(TGDR) to specifically tackle multi-class classification of microarray data.
When there are several microarray experiments addressing the same/similar
objectives, one option is to use meta-analysis version of TGDR (Meta-TGDR),
which considers the classification task as combination of classifiers with the
same structure/model while allowing the parameters to vary across studies.
However, the original Meta-TGDR extension did not offer a solution to the
prediction on independent samples. Here, we propose an explicit method to
estimate the overall coefficients of the biomarkers selected by Meta-TGDR. This
extension permits broader applicability and allows a comparison between the
predictive performance of Meta-TGDR and TGDR using an independent testing set.
Results
Using real-world applications, we demonstrated the proposed multi-TGDR
framework works well and the number of selected genes is less than the sum of
all individualized binary TGDRs. Additionally, Meta-TGDR and TGDR on the
batch-effect adjusted pooled data approximately provided same results. By
adding Bagging procedure in each application, the stability and good predictive
performance are warranted.
Conclusions
Compared with Meta-TGDR, TGDR is less computing time intensive, and requires
no samples of all classes in each study. On the adjusted data, it has
approximate same predictive performance with Meta-TGDR. Thus, it is highly
recommended