57 research outputs found

    Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

    Full text link
    Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The "dimensionality curse" makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.Comment: 7 pages, 4 figure

    A Deep Learning Approach to Integrate Medical Big Data for Improving Health Services in Indonesia

    Get PDF
    Medical Informatics to support health services in Indonesia is proposed in this paper. The focuses of paper to the analysis of Big Data for health care purposes with the aim of improving and developing clinical decision support systems (CDSS) or assessing medical data both for quality assurance and accessibility of health services. Electronic health records (EHR) are very rich in medical data sourced from patient. All the data can be aggregated to produce information, which includes medical history details such as, diagnostic tests, medicines and treatment plans, immunization records, allergies, radiological images, multivariate sensors device, laboratories, and test results. All the information will provide a valuable understanding of disease management system. In Indonesia country, with many rural areas with limited doctor it is an important case to investigate. Data mining about large-scale individuals and populations through EHRs can be combined with mobile networks and social media to inform about health and public policy. To support this research, many researchers have been applied the Deep Learning (DL) approach in data-mining problems related to health informatics. However, in practice, the use of DL is still questionable due to achieve optimal performance, relatively large data and resources are needed, given there are other learning algorithms that are relatively fast but produce close performance with fewer resources and parameterization, and have a better interpretability. In this paper, the advantage of Deep Learning to design medical informatics is described, due to such an approach is needed to make a good CDSS of health services

    Autoencoded DNA methylation data to predict breast cancer recurrence: Machine learning models and gene-weight significance

    Get PDF
    Breast cancer is the most frequent cancer in women and the second most frequent overall after lung cancer. Although the 5-year survival rate of breast cancer is relatively high, recurrence is also common which often involves metastasis with its consequent threat for patients. DNA methylation-derived databases have become an interesting primary source for supervised knowledge extraction regarding breast cancer. Unfortunately, the study of DNA methylation involves the processing of hundreds of thousands of features for every patient. DNA methylation is featured by High Dimension Low Sample Size which has shown well-known issues regarding feature selection and generation. Autoencoders (AEs) appear as a specific technique for conducting nonlinear feature fusion. Our main objective in this work is to design a procedure to summarize DNA methylation by taking advantage of AEs. Our proposal is able to generate new features from the values of CpG sites of patients with and without recurrence. Then, a limited set of relevant genes to characterize breast cancer recurrence is proposed by the application of survival analysis and a pondered ranking of genes according to the distribution of their CpG sites. To test our proposal we have selected a dataset from The Cancer Genome Atlas data portal and an AE with a single-hidden layer. The literature and enrichment analysis (based on genomic context and functional annota tion) conducted regarding the genes obtained with our experiment confirmed that all of these genes were related to breast cancer recurrence.Ministerio de Economía y Competitividad TIN2014-55894-C2-RMinisterio de Economía y Competitividad TIN2017-88209-C2-2-

    AI-Enabled Lung Cancer Prognosis

    Full text link
    Lung cancer is the primary cause of cancer-related mortality, claiming approximately 1.79 million lives globally in 2020, with an estimated 2.21 million new cases diagnosed within the same period. Among these, Non-Small Cell Lung Cancer (NSCLC) is the predominant subtype, characterized by a notably bleak prognosis and low overall survival rate of approximately 25% over five years across all disease stages. However, survival outcomes vary considerably based on the stage at diagnosis and the therapeutic interventions administered. Recent advancements in artificial intelligence (AI) have revolutionized the landscape of lung cancer prognosis. AI-driven methodologies, including machine learning and deep learning algorithms, have shown promise in enhancing survival prediction accuracy by efficiently analyzing complex multi-omics data and integrating diverse clinical variables. By leveraging AI techniques, clinicians can harness comprehensive prognostic insights to tailor personalized treatment strategies, ultimately improving patient outcomes in NSCLC. Overviewing AI-driven data processing can significantly help bolster the understanding and provide better directions for using such systems.Comment: This is the author's version of a book chapter entitled: "Cancer Research: An Interdisciplinary Approach", Springe

    OmiEmbed: a unified multi-task deep learning framework for multi-omics data

    Full text link
    High-dimensional omics data contains intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data due to the large number of molecular features and small number of available samples, which is also called 'the curse of dimensionality' in machine learning. To tackle this problem and pave the way for machine learning aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed support multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy comparing to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various application of high-dimensional omics data and has a great potential to facilitate more accurate and personalised clinical decision making.Comment: 14 pages, 8 figures, 7 table

    Deep Learning in Single-Cell Analysis

    Full text link
    Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developments in classical and deep learning methods and discuss their advantages and disadvantages. Deep learning tools and benchmark datasets are also summarized for each task. Finally, we discuss the future directions and the most recent challenges. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations.Comment: 77 pages, 11 figures, 15 tables, deep learning, single-cell analysi

    CancerNet: a unified deep learning network for pan‑cancer diagnostics

    Get PDF
    Article states that despite remarkable advances in cancer research, cancer remains one of the leading causes of death worldwide. The author's proposed framework for cancer diagnostics detects cancers and their tissues of origin using a unified model of cancers encompassing 33 cancers represented in The Cancer Genome Atlas. Their model exploits the learned features of different cancers reflected in the respective dysregulated epigenomes, holding a great promise in early cancer detection

    A Pan-cancer Somatic Mutation Embedding using Autoencoders

    Get PDF
    Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; ArgentinaFil: Beauseroy, Pierre. Université de Technologie de Troyes; FranciaFil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentin
    corecore