3 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Prediction Models for Cancer Risk and Prognosis using Clinical and DNA Methylation Biomarkers: Considerations in Study Design and Model Development

    Get PDF
    The ability to accurately predict the prognosis for any given disease is of immense value for clinicians and patients. It can dictate and optimize an individual treatment plan for a patient and ultimately improve their quality of life and reduce the financial burden associated with unnecessary treatment. To allow the accurate prediction of disease prognosis, ongoing development of prediction models is of crucial importance. We introduce a novel curated, ad-hoc, feature selection (CAFS) strategy in the context of the Prostate Cancer DREAM Challenge. We demonstrate enhanced prediction performance of overall survival differences in patients with metastatic castration-resistant prostate cancer by applying CAFS and identify clinically important risk-predictors. With ongoing advancements in the omics field promising molecular biomarkers are being identified in order to facilitate disease prognosis beyond the capability of clinical information. The identification of such biomarkers depends on the examination of omic marks in adequately powered studies. With the goal to assist researchers in study design and planning of epigenome wide association studies of DNA methylation, we present a user-friendly tool, pwrEWAS, for comprehensive power estimation for epigenome-wide association studies. The R package for pwrEWAS is publicly available at GitHub (https://github.com/stefangraw/pwrEWAS) and the web interface is available at https://biostats-shinyr.kumc.edu/pwrEWAS/. The enormous volume of omic marks requires stringent evaluation to discover combinations of complementary marks that assemble predictive biomarkers. We therefore present a heuristic feature selection approach that allows one to handle such high-dimensional data. Selection Probability Optimization for Feature Selection (SPOFS) is designed to identify an optimal subset of omic features from among a vast pool of such features, which collectively improves prediction accuracy and form a biomarker. The integration of such biomarkers can then be utilized in the development and improvement of prediction models
    corecore