57 research outputs found
Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification
Different aspects of a clinical sample can be revealed by multiple types of
omics data. Integrated analysis of multi-omics data provides a comprehensive
view of patients, which has the potential to facilitate more accurate clinical
decision making. However, omics data are normally high dimensional with large
number of molecular features and relatively small number of available samples
with clinical labels. The "dimensionality curse" makes it challenging to train
a machine learning model using high dimensional omics data like DNA methylation
and gene expression profiles. Here we propose an end-to-end deep learning model
called OmiVAE to extract low dimensional features and classify samples from
multi-omics data. OmiVAE combines the basic structure of variational
autoencoders with a classification network to achieve task-oriented feature
extraction and multi-class classification. The training procedure of OmiVAE is
comprised of an unsupervised phase without the classifier and a supervised
phase with the classifier. During the unsupervised phase, a hierarchical
cluster structure of samples can be automatically formed without the need for
labels. And in the supervised phase, OmiVAE achieved an average classification
accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and
normal samples, which shows better performance than other existing methods. The
OmiVAE model learned from multi-omics data outperformed that using only one
type of omics data, which indicates that the complementary information from
different omics datatypes provides useful insights for biomedical tasks like
cancer classification.Comment: 7 pages, 4 figure
A Deep Learning Approach to Integrate Medical Big Data for Improving Health Services in Indonesia
Medical Informatics to support health services in Indonesia is proposed in this paper. The focuses of paper to the analysis of Big Data for health care purposes with the aim of improving and developing clinical decision support systems (CDSS) or assessing medical data both for quality assurance and accessibility of health services. Electronic health records (EHR) are very rich in medical data sourced from patient. All the data can be aggregated to produce information, which includes medical history details such as, diagnostic tests, medicines and treatment plans, immunization records, allergies, radiological images, multivariate sensors device, laboratories, and test results. All the information will provide a valuable understanding of disease management system. In Indonesia country, with many rural areas with limited doctor it is an important case to investigate. Data mining about large-scale individuals and populations through EHRs can be combined with mobile networks and social media to inform about health and public policy. To support this research, many researchers have been applied the Deep Learning (DL) approach in data-mining problems related to health informatics. However, in practice, the use of DL is still questionable due to achieve optimal performance, relatively large data and resources are needed, given there are other learning algorithms that are relatively fast but produce close performance with fewer resources and parameterization, and have a better interpretability. In this paper, the advantage of Deep Learning to design medical informatics is described, due to such an approach is needed to make a good CDSS of health services
Autoencoded DNA methylation data to predict breast cancer recurrence: Machine learning models and gene-weight significance
Breast cancer is the most frequent cancer in women and the second most frequent overall after lung cancer.
Although the 5-year survival rate of breast cancer is relatively high, recurrence is also common which often
involves metastasis with its consequent threat for patients. DNA methylation-derived databases have become an
interesting primary source for supervised knowledge extraction regarding breast cancer. Unfortunately, the study
of DNA methylation involves the processing of hundreds of thousands of features for every patient. DNA
methylation is featured by High Dimension Low Sample Size which has shown well-known issues regarding
feature selection and generation. Autoencoders (AEs) appear as a specific technique for conducting nonlinear
feature fusion. Our main objective in this work is to design a procedure to summarize DNA methylation by taking
advantage of AEs. Our proposal is able to generate new features from the values of CpG sites of patients with and
without recurrence. Then, a limited set of relevant genes to characterize breast cancer recurrence is proposed by
the application of survival analysis and a pondered ranking of genes according to the distribution of their CpG
sites. To test our proposal we have selected a dataset from The Cancer Genome Atlas data portal and an AE with a
single-hidden layer. The literature and enrichment analysis (based on genomic context and functional annota tion) conducted regarding the genes obtained with our experiment confirmed that all of these genes were related
to breast cancer recurrence.Ministerio de Economía y Competitividad TIN2014-55894-C2-RMinisterio de Economía y Competitividad TIN2017-88209-C2-2-
AI-Enabled Lung Cancer Prognosis
Lung cancer is the primary cause of cancer-related mortality, claiming
approximately 1.79 million lives globally in 2020, with an estimated 2.21
million new cases diagnosed within the same period. Among these, Non-Small Cell
Lung Cancer (NSCLC) is the predominant subtype, characterized by a notably
bleak prognosis and low overall survival rate of approximately 25% over five
years across all disease stages. However, survival outcomes vary considerably
based on the stage at diagnosis and the therapeutic interventions administered.
Recent advancements in artificial intelligence (AI) have revolutionized the
landscape of lung cancer prognosis. AI-driven methodologies, including machine
learning and deep learning algorithms, have shown promise in enhancing survival
prediction accuracy by efficiently analyzing complex multi-omics data and
integrating diverse clinical variables. By leveraging AI techniques, clinicians
can harness comprehensive prognostic insights to tailor personalized treatment
strategies, ultimately improving patient outcomes in NSCLC. Overviewing
AI-driven data processing can significantly help bolster the understanding and
provide better directions for using such systems.Comment: This is the author's version of a book chapter entitled: "Cancer
Research: An Interdisciplinary Approach", Springe
OmiEmbed: a unified multi-task deep learning framework for multi-omics data
High-dimensional omics data contains intrinsic biomedical information that is
crucial for personalised medicine. Nevertheless, it is challenging to capture
them from the genome-wide data due to the large number of molecular features
and small number of available samples, which is also called 'the curse of
dimensionality' in machine learning. To tackle this problem and pave the way
for machine learning aided precision medicine, we proposed a unified multi-task
deep learning framework named OmiEmbed to capture biomedical information from
high-dimensional omics data with the deep embedding and downstream task
modules. The deep embedding module learnt an omics embedding that mapped
multiple omics data types into a latent space with lower dimensionality. Based
on the new representation of multi-omics data, different downstream task
modules were trained simultaneously and efficiently with the multi-task
strategy to predict the comprehensive phenotype profile of each sample.
OmiEmbed support multiple tasks for omics data including dimensionality
reduction, tumour type classification, multi-omics integration, demographic and
clinical feature reconstruction, and survival prediction. The framework
outperformed other methods on all three types of downstream tasks and achieved
better performance with the multi-task strategy comparing to training them
individually. OmiEmbed is a powerful and unified framework that can be widely
adapted to various application of high-dimensional omics data and has a great
potential to facilitate more accurate and personalised clinical decision
making.Comment: 14 pages, 8 figures, 7 table
Recommended from our members
Detection and Classification of Cancer and Other Noncommunicable Diseases Using Neural Network Models
Here, we show that training with multiple noncommunicable diseases (NCDs) is both feasible and beneficial to modeling this class of diseases. We first use data from the Cancer Genome Atlas (TCGA) to train a pan cancer model, and then characterize the information the model has learned about the cancers. In doing this we show that the model has learned concepts that are relevant to the task of cancer classification. We also test the model on datasets derived independently of the TCGA cohort and show that the model is robust to data outside of its training distribution such as precancerous legions and metastatic samples. We then utilize the cancer model as the basis of a transfer learning study where we retrain it on other, non-cancer NCDs. In doing so we show that NCDs with very differing underlying biology contain extractible information relevant to each other allowing for a broader model of NCDs to be developed with existing datasets. We then test the importance of the samples source tissue in the model and find that the NCD class and tissue source may not be independent in our model. To address this, we use the tissue encodings to create augmented samples. We test how successfully we can use these augmented samples to remove or diminish tissue source importance to NCD class through retraining the model. In doing this we make key observations about the nature of concept importance and its usefulness in future neural network explainability efforts
Deep Learning in Single-Cell Analysis
Single-cell technologies are revolutionizing the entire field of biology. The
large volumes of data generated by single-cell technologies are
high-dimensional, sparse, heterogeneous, and have complicated dependency
structures, making analyses using conventional machine learning approaches
challenging and impractical. In tackling these challenges, deep learning often
demonstrates superior performance compared to traditional machine learning
methods. In this work, we give a comprehensive survey on deep learning in
single-cell analysis. We first introduce background on single-cell technologies
and their development, as well as fundamental concepts of deep learning
including the most popular deep architectures. We present an overview of the
single-cell analytic pipeline pursued in research applications while noting
divergences due to data sources or specific applications. We then review seven
popular tasks spanning through different stages of the single-cell analysis
pipeline, including multimodal integration, imputation, clustering, spatial
domain identification, cell-type deconvolution, cell segmentation, and
cell-type annotation. Under each task, we describe the most recent developments
in classical and deep learning methods and discuss their advantages and
disadvantages. Deep learning tools and benchmark datasets are also summarized
for each task. Finally, we discuss the future directions and the most recent
challenges. This survey will serve as a reference for biologists and computer
scientists, encouraging collaborations.Comment: 77 pages, 11 figures, 15 tables, deep learning, single-cell analysi
CancerNet: a unified deep learning network for pan‑cancer diagnostics
Article states that despite remarkable advances in cancer research, cancer remains one of the leading causes of death worldwide. The author's proposed framework for cancer diagnostics detects cancers and their tissues of origin using a unified model of cancers encompassing 33 cancers represented in The Cancer Genome Atlas. Their model exploits the learned features of different cancers reflected in the respective dysregulated epigenomes, holding a great promise in early cancer detection
A Pan-cancer Somatic Mutation Embedding using Autoencoders
Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; ArgentinaFil: Beauseroy, Pierre. Université de Technologie de Troyes; FranciaFil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentin
- …