Article thumbnail

Biomarker selection by transfer learning with linear regularized models

By Thibault Helleputte, Pierre Dupont and Third International Workshop on Machine Learning in Systems Biology (MLSB)


This work presents a novel feature selection method for classication of high dimensional data, such as those produced by microarrays. Classication of such data is challenging, as it typically relies on a few tens of samples but several thousand dimensions (genes). The number of microarray chips needed to obtain robust models is generally orders of magnitude higher than most datasets oer. The number of available datasets is however continuously rising, for example in databases like the NCBI's Gene Expression Omnibus (GEO). Building a large microarray dataset consisting of the simple juxtaposition of independent smaller datasets is dicult or irrelevant due to dierences either in terms of biological topics, technical constraints or experimental protocols. Biomarker selection specically refers to the identication of a small set of genes, a signature, related to a pathology or an observed treatment outcome. The lack of robustness of biomarker selection has been outlined. In the context of biomarker selection from microarray data, a high stability means that dierent subsets of patients lead to very similar signatures and is a desirable property. The biological process explaining the outcome is indeed assumed to be mostly common among dierent patients. Our feature selection technique includes a partial supervision (PS) to smoothly favor the selection of some dimensions (genes) on a new target dataset to be classied. The dimensions to be favored are previously selected with a simple univariate technique, like a t-test, from similar source datasets, for example from GEO, hence performing inductive transfer learning at the feature level. We rely here on our recently proposed PS-l2-AROMmethod, a feature selection approach embedded in a regularized linear model. This algorithm reduces to linear SVM learning with iterative rescaling of the input features. The scaling factors depend here on the selected dimensions on the source domains. The proposed optimiza- tion procedure smoothly favors the pre-selected features but the nally selected dimensions may depart from those to optimize the classication objective under rescaled margin constraints. Practical experiments on several microarray datasets illustrate that the pro- posed approach not only increases classication performances, as usual with sound transfer learning scheme, but also the stability of the selected dimensions with respect to sampling variation. It is also shown that multiple transfer from various source datasets can bring further improvements

Topics: 1162, QA75
Year: 2009
OAI identifier:
Provided by: DIAL UCLouvain
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.