63,078 research outputs found
Stable Feature Selection for Biomarker Discovery
Feature selection techniques have been used as the workhorse in biomarker
discovery applications for a long time. Surprisingly, the stability of feature
selection with respect to sampling variations has long been under-considered.
It is only until recently that this issue has received more and more attention.
In this article, we review existing stable feature selection methods for
biomarker discovery using a generic hierarchal framework. We have two
objectives: (1) providing an overview on this new yet fast growing topic for a
convenient reference; (2) categorizing existing methods under an expandable
framework for future research and development
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
A machine learning pipeline for discriminant pathways identification
Motivation: Identifying the molecular pathways more prone to disruption
during a pathological process is a key task in network medicine and, more in
general, in systems biology.
Results: In this work we propose a pipeline that couples a machine learning
solution for molecular profiling with a recent network comparison method. The
pipeline can identify changes occurring between specific sub-modules of
networks built in a case-control biomarker study, discriminating key groups of
genes whose interactions are modified by an underlying condition. The proposal
is independent from the classification algorithm used. Three applications on
genomewide data are presented regarding children susceptibility to air
pollution and two neurodegenerative diseases: Parkinson's and Alzheimer's.
Availability: Details about the software used for the experiments discussed
in this paper are provided in the Appendix
- …