A multi-task ensemble strategy for gene selection and cancer classification

Abstract

Gene expression-based tumor classification aims to distinguish tumor types based on gene expression profiles. This task is difficult due to the high dimensionality of gene expression data and limited sample sizes. Most datasets contain tens of thousands of genes but only a small number of samples. As a result, selecting informative genes is necessary to improve classification performance and model interpretability. Many existing gene selection methods fail to produce stable and consistent results, especially when training data are limited. To address this, we propose a multi-task ensemble strategy that combines repeated sampling with joint feature selection and classification. The method generates multiple training subsets and applies multi-task logistic regression with ℓ2,1 group sparsity regularization to select a subset of genes that appears consistently across tasks. This promotes stability and reduces redundancy. The framework supports integration with standard classifiers such as logistic regression and support vector machines. It performs both gene selection and classification in a single process. We evaluate the method on simulated and real gene expression datasets. The results show that it outperforms several baseline methods in classification accuracy and the consistency of selected genes.</p

Similar works

Full text

thumbnail-image

ARU Anglia Ruskin Research (ARRO)

redirect
Last time updated on 01/12/2025

This paper was published in ARU Anglia Ruskin Research (ARRO).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: CC BY 4.0