MR2: A Two-stage Feature Selection Algorithm in High-throughput Methylation Data for Max-relevance and Min-redundancy

Abstract

Recent advances reveal that DNA methylation plays an important role in regulating different genome functions where anomalous methylation levels are associated with various cancer types. Feature selection algorithms are geared towards high-throughput analysis of DNA methylation to help identify idiosyncratic DNA methylation profiles associated with cancer types and subtypes. In high dimensional and highly correlated DNA methylation data, feature selection algorithms aim at selecting an efficient and comprehensive feature set to better capture characteristics of phenotypes. In this work, we introduce a two-stage feature selection algorithm (MR2) based on maximum relevance and minimum redundancy criteria. The features that satisfy the relevance conditions are filtered in the first stage, in the second stage, the final subset of loci is selected to reach minimal redundancy by using a k-medoids clustering algorithm that embeds a succinct uncertainty measure score. The performance of the proposed feature selection algorithm is benchmarked against those of the principal component analysis and four other commonly used filtering methods using lung and breast cancer datasets obtained from Gene Expression Omnibus in terms of their classification errors in support vector machine classifiers. Our MR2 algorithm outperforms these filtering based algorithms while at the same time providing more interpretable results

    Similar works