22 research outputs found

    Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

    Full text link
    Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The "dimensionality curse" makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.Comment: 7 pages, 4 figure

    CancerNet: a unified deep learning network for pan‑cancer diagnostics

    Get PDF
    Article states that despite remarkable advances in cancer research, cancer remains one of the leading causes of death worldwide. The author's proposed framework for cancer diagnostics detects cancers and their tissues of origin using a unified model of cancers encompassing 33 cancers represented in The Cancer Genome Atlas. Their model exploits the learned features of different cancers reflected in the respective dysregulated epigenomes, holding a great promise in early cancer detection

    Evaluation of colorectal cancer subtypes and cell lines using deep learning

    Get PDF
    Colorectal cancer (CRC) is a common cancer with a high mortality rate and rising incidence rate in the developed world. Molecular profiling techniques have been used to study the variability between tumours as well as cancer models such as cell lines, but their translational value is incomplete with current methods. Moreover, first generation computational methods for subtype classification do not make use of multi-omics data in full scale. Drug discovery programs use cell lines as a proxy for human cancers to characterize their molecular makeup and drug response, identify relevant indications and discover biomarkers. In order to maximize the translatability and the clinical relevance of in vitro studies, selection of optimal cancer models is imperative. We present a novel subtype classification method based on deep learning and apply it to classify CRC tumors using multi-omics data, and further to measure the similarity between tumors and disease models such as cancer cell lines. Multi-omics Autoencoder Integration (maui) efficiently leverages data sets containing copy number alterations, gene expression, and point mutations, and learns clinically important patterns (latent factors) across these data types. Using these latent factors, we propose a refinement of the gold-standard CRC subtypes, and propose best-matching cell lines for the different subtypes. These findings are relevant for patient stratification and selection of cell lines for drug discovery pipelines, biomarker discovery, and target identification

    Evaluation of colorectal cancer subtypes and cell lines using deep learning

    Get PDF
    Colorectal cancer (CRC) is a common cancer with a high mortality rate and a rising incidence rate in the developed world. Molecular profiling techniques have been used to better understand the variability between tumors and disease models such as cell lines. To maximize the translatability and clinical relevance of in vitro studies, the selection of optimal cancer models is imperative. We have developed a deep learning-based method to measure the similarity between CRC tumors and disease models such as cancer cell lines. Our method efficiently leverages multiomics data sets containing copy number alterations, gene expression, and point mutations and learns latent factors that describe data in lower dimensions. These latent factors represent the patterns that are clinically relevant and explain the variability of molecular profiles across tumors and cell lines. Using these, we propose refined CRC subtypes and provide best-matching cell lines to different subtypes. These findings are relevant to patient stratification and selection of cell lines for early-stage drug discovery pipelines, biomarker discovery, and target identification

    Recent Advances in Variational Autoencoders With Representation Learning for Biomedical Informatics: A Survey

    Get PDF
    Variational autoencoders (VAEs) are deep latent space generative models that have been immensely successful in multiple exciting applications in biomedical informatics such as molecular design, protein design, medical image classification and segmentation, integrated multi-omics data analyses, and large-scale biological sequence analyses, among others. The fundamental idea in VAEs is to learn the distribution of data in such a way that new meaningful data with more intra-class variations can be generated from the encoded distribution. The ability of VAEs to synthesize new data with more representation variance at state-of-art levels provides hope that the chronic scarcity of labeled data in the biomedical field can be resolved. Furthermore, VAEs have made nonlinear latent variable models tractable for modeling complex distributions. This has allowed for efficient extraction of relevant biomedical information from learned features for biological data sets, referred to as unsupervised feature representation learning. In this article, we review the various recent advancements in the development and application of VAEs for biomedical informatics. We discuss challenges and future opportunities for biomedical research with respect to VAEs.https://doi.org/10.1109/ACCESS.2020.304830

    Multimodal Image and Spectral Feature Learning for Efficient Analysis of Water-Suspended Particles

    Get PDF
    apan Science and Technology Agency SICORP and Natural Environment Research Council (JST-NERC SICORP Marine Sensor Proof of Concept Grant JPMJSC1705, NE/R01227X/1); JSPS KAKENHI Grant (18K13934 and 18H03810); Sumitomo Foundation: Grant for environmental Research Project (203122). Acknowledgments. The authors thank Dr. T. Fukuba for the support for building the experimental setup. The authors also thank Dr. H. Sawada for providing samples for this work.Peer reviewedPublisher PD

    Multimodal image and spectral feature learning for efficient analysis of water-suspended particles

    Get PDF
    We have developed a method to combine morphological and chemical information for the accurate identification of different particle types using optical measurement techniques that require no sample preparation. A combined holographic imaging and Raman spectroscopy setup is used to gather data from six different types of marine particles suspended in a large volume of seawater. Unsupervised feature learning is performed on the images and the spectral data using convolutional and single-layer autoencoders. The learned features are combined, where we demonstrate that non-linear dimensional reduction of the combined multimodal features can achieve a high clustering macro F1 score of 0.88, compared to a maximum of 0.61 when only image or spectral features are used. The method can be applied to long-term monitoring of particles in the ocean without the need for sample collection. In addition, it can be applied to data from different types of sensor measurements without significant modifications

    A Deep Survival EWAS approach estimating risk profile based on pre-diagnostic DNA methylation: An application to breast cancer time to diagnosis

    Get PDF
    Previous studies for cancer biomarker discovery based on pre-diagnostic blood DNA methylation (DNAm) profiles, either ignore the explicit modeling of the Time To Diagnosis (TTD), or provide inconsistent results. This lack of consistency is likely due to the limitations of standard EWAS approaches, that model the effect of DNAm at CpG sites on TTD independently. In this work, we aim to identify blood DNAm profiles associated with TTD, with the aim to improve the reliability of the results, as well as their biological meaningfulness. We argue that a global approach to estimate CpG sites effect profile should capture the complex (potentially non-linear) relationships interplaying between sites. To prove our concept, we develop a new Deep Learning-based approach assessing the relevance of individual CpG Islands (i.e., assigning a weight to each site) in determining TTD while modeling their combined effect in a survival analysis scenario. The algorithm combines a tailored sampling procedure with DNAm sites agglomeration, deep non-linear survival modeling and SHapley Additive exPlanations (SHAP) values estimation to aid robustness of the derived effects profile. The proposed approach deals with the common complexities arising from epidemiological studies, such as small sample size, noise, and low signal-to-noise ratio of blood-derived DNAm. We apply our approach to a prospective case-control study on breast cancer nested in the EPIC Italy cohort and we perform weighted gene-set enrichment analyses to demonstrate the biological meaningfulness of the obtained results. We compared the results of Deep Survival EWAS with those of a traditional EWAS approach, demonstrating that our method performs better than the standard approach in identifying biologically relevant pathways

    MACHINE LEARNING APPROACHES FOR BIOMARKER IDENTIFICATION AND SUBGROUP DISCOVERY FOR POST-TRAUMATIC STRESS DISORDER

    Get PDF
    Post-traumatic stress disorder (PTSD) is a psychiatric disorder caused by environmental and genetic factors resulting from alterations in genetic variation, epigenetic changes and neuroimaging characteristics. There is a pressing need to identify reliable molecular and physiological biomarkers for accurate diagnosis, prognosis, and treatment, as well to deepen the understanding of PTSD pathophysiology. Machine learning methods are widely used to infer patterns from biological data, identify biomarkers, and make predictions. The objective of this research is to apply machine learning methods for the accurate classification of human diseases from genome-scale datasets, focusing primarily on PTSD.The DoD-funded Systems Biology of PTSD Consortium has recruited combat veterans with and without PTSD for measurement of molecular and physiological data from blood or urine samples with the goal of identifying accurate and specific PTSD biomarkers. As a member of the Consortium with access to these PTSD multiple omics datasets, we first completed a project titled Clinical Subgroup-Specific PTSD Classification and Biomarker Discovery. We applied machine learning approaches to these data to build classification models consisting of molecular and clinical features to predict PTSD status. We also identified candidate biomarkers for diagnosis, which improves our understanding of PTSD pathogenesis. In a second project, entitled Multi-Omic PTSD Subgroup Identification and Clinical Characterization, we applied methods for integrating multiple omics datasets to investigate the complex, multivariate nature of the biological systems underlying PTSD. We identified an optimal 2 PTSD subgroups using two different machine learning approaches from 82 PTSD positive samples, and we found that the subgroups exhibited different remitting behavior as inferred from subjects recalled at a later time point. The results from our association, differential expression, and classification analyses demonstrated the distinct clinical and molecular features characterizing these subgroups.Taken together, our work has advanced our understanding of PTSD biomarkers and subgroups through the use of machine learning approaches. Results from our work should strongly contribute to the precise diagnosis and eventual treatment of PTSD, as well as other diseases. Future work will involve continuing to leverage these results to enable precision medicine for PTSD
    corecore