11 research outputs found

    An Enhancement to CNN Approach with Synthesized Image Data for Disease Subtype Classification

    Get PDF
    The introduction of genetic testing has profoundly enhanced the prospects of early detection of diseases and techniques to suggest precision medicines. The subtyping of critical diseases has proven to be an essential part of the development of individualized therapies and has led to deeper insights into the heterogeneity of the disease. Studies suggest that variants in particular genes have significant effects on certain types of immune system cells and are also involved in the risk of certain critical illnesses like cancer. By analyzing the genetic sequence of a patient, disease types and subtypes can be predicted. Recent research work has shown that the CNN\u27s prediction quality within this context using gene intensity features could be improved when the input is structured into 2D images. Constructed from chromosome locations or from transformations involving kPCA, t-SNE, etc., these two-dimensional images express certain types of relationships among the intensity features. While this approach extends the success of convolutional neural networks to non-image data, getting a precise mapping of features on the images to reflect the relationship among the features is hard, if not impossible. To this end, we propose an enhancement to the approach by providing the CNN training procedure with not only the samples of the structured image data but also the samples from the unstructured raw gene expression data in its original form. While the former is fed into the convolutional layers in the network, the latter is input only to the fully connected layers of the network. The proposed method is applied to The Cancer Genome Atlas (TCGA) dataset for cancer subtypes with the median values of the expression level of all expressed genes in an RNA sequence. According to the experiments, our proposed approach can improve the classification accuracy by 2.7% when it is applied to the state-of-the-art method with 2D CNN architecture trained using images that are constructed based on chromosome locations of the genes. When built on top of the method with 2D CNN architecture trained using images that are constructed with transformation process involving t-SNE, classification accuracy is enhanced by 4.7%. For the implementation of the proposed approach on the 1D CNN model using the data structured using covariance between the features, the classification accuracy is improved by 1% and an increase of 3% is observed when the approach is implemented over the model trained using 1D CNN with data ordered based on chromosome locations

    Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

    Full text link
    Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The "dimensionality curse" makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.Comment: 7 pages, 4 figure

    OmiEmbed: a unified multi-task deep learning framework for multi-omics data

    Full text link
    High-dimensional omics data contains intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data due to the large number of molecular features and small number of available samples, which is also called 'the curse of dimensionality' in machine learning. To tackle this problem and pave the way for machine learning aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed support multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy comparing to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various application of high-dimensional omics data and has a great potential to facilitate more accurate and personalised clinical decision making.Comment: 14 pages, 8 figures, 7 table

    DeepInsight: a methodology to transform a non - image data to an image for convolution neural network architecture

    Get PDF
    It is critical, but difficult, to catch the small variation in genomic or other kinds of data that differentiates phenotypes or categories. A plethora of data is available, but the information from its genes or elements is spread over arbitrarily, making it challenging to extract relevant details for identification. However, an arrangement of similar genes into clusters makes these differences more accessible and allows for robust identification of hidden mechanisms (e.g. pathways) than dealing with elements individually. Here we propose, DeepInsight, which converts non-image samples into a well-organized image-form. Thereby, the power of convolution neural network (CNN), including GPU utilization, can be realized for non-image samples. Furthermore, DeepInsight enables feature extraction through the application of CNN for non-image samples to seize imperative information and shown promising results. To our knowledge, this is the first work to apply CNN simultaneously on different kinds of non-image datasets: RNA-seq, vowels, text, and artificial

    Estimates of gene ensemble noise highlight critical pathways and predict disease severity in H1N1, COVID-19 and mortality in sepsis patients

    Get PDF
    Finding novel biomarkers for human pathologies and predicting clinical outcomes for patients is challenging. This stems from the heterogeneous response of individuals to disease and is reflected in the inter-individual variability of gene expression responses that obscures differential gene expression analysis. Here, we developed an alternative approach that could be applied to dissect the disease-associated molecular changes. We define gene ensemble noise as a measure that represents a variance for a collection of genes encoding for either members of known biological pathways or subunits of annotated protein complexes and calculated within an individual. The gene ensemble noise allows for the holistic identification and interpretation of gene expression disbalance on the level of gene networks and systems. By comparing gene expression data from COVID-19, H1N1, and sepsis patients we identified common disturbances in a number of pathways and protein complexes relevant to the sepsis pathology. Among others, these include the mitochondrial respiratory chain complex I and peroxisomes. This suggests a Warburg effect and oxidative stress as common hallmarks of the immune host-pathogen response. Finally, we showed that gene ensemble noise could successfully be applied for the prediction of clinical outcome namely, the mortality of patients. Thus, we conclude that gene ensemble noise represents a promising approach for the investigation of molecular mechanisms of pathology through a prism of alterations in the coherent expression of gene circuits

    Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A review

    Full text link
    Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning has enabled in-depth analysis of rich multi-omics profiles and medical imaging for cancer diagnosis and prognosis. Despite these advancements, machine learning models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to surmount these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art machine learning studies that adopted the fusion of biomedical knowledge and data, termed knowledge-informed machine learning, for cancer diagnosis and prognosis. Emphasizing the properties inherent in four primary data types including clinical, imaging, molecular, and treatment data, we highlight modeling considerations relevant to these contexts. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions to advance cancer research through knowledge-informed machine learning.Comment: 41 pages, 4 figures, 2 table

    AI and precision oncology in clinical cancer genomics : from prevention to targeted cancer therapies-an outcomes based patient care

    Get PDF
    Precision medicine is the personalization of medicine to suit a specific group of people or even an individual patient, based on genetic or molecular profiling. This can be done using genomic, transcriptomic, epigenomic or proteomic information. Personalized medicine holds great promise, especially in cancer therapy and control, where precision oncology would allow medical practitioners to use this information to optimize the treatment of a patient. Personalized oncology for groups of individuals would also allow for the use of population group specific diagnostic or prognostic biomarkers. Additionally, this information can be used to track the progress of the disease or monitor the response of the patient to treatment. This can be used to establish the molecular basis for drug resistance and allow the targeting of the genes or pathways responsible for drug resistance. Personalized medicine requires the use of large data sets, which must be processed and analysed in order to identify the particular molecular patterns that can inform the decisions required for personalized care. However, the analysis of these large data sets is difficult and time consuming. This is further compounded by the increasing size of these datasets due to technologies such as next generation sequencing (NGS). These difficulties can be met through the use of artificial intelligence (AI) and machine learning (ML). These computational tools use specific neural networks, learning methods, decision making tools and algorithms to construct and improve on models for the analysis of different types of large data sets. These tools can also be used to answer specific questions. Artificial intelligence can also be used to predict the effects of genetic changes on protein structure and therefore function. This review will discuss the current state of the application of AI to omics data, specifically genomic data, and how this is applied to the development of personalized or precision medicine on the treatment of cancer.The South African Medical Research Council (SAMRC) and the National Research Foundation (NRF).https://www.elsevier.com/locate/imuhj2023Anatomical PathologyMaxillo-Facial and Oral SurgeryMedical OncologyOtorhinolaryngologyRadiologySurgeryUrolog

    Convolutional Neural Networks and their Application in Cancer Diagnosis based on RNA-Sequencing

    Get PDF
    Η έκφραση γονιδίων αποτελεί τη μελέτη της λειτουργίας της γονιδιακής μεταγραφής, κατά την οποία συνθέτονται γονιδιακά προϊόντα, είδη RNA ή πρωτεΐνες. Η μελέτη της παρέχει την κατανόηση των κυτταρικών λειτουργιών, όπως η κυτταρική διαφοροποίηση και οι μη φυσιολογικές παθολογικές λειτουργίες. Ο καρκίνος αποτελεί μία γενετική ασθένεια όπου γενετικές παραλλαγές προκαλούν μη φυσιολογικές λειτουργίες στα γονίδια και τροποποιούν την έκφραση τους. Οι πρωτεΐνες, οι οποίες αποτελούν το τελικό αποτέλεσμα της έκφρασης γονιδίων, καθορίζουν τους φαινοτύπους και τις βιολογικές λειτουργίες. Συνεπώς, η ανίχνευση των επιπέδων έκφρασης γονιδίων δύναται να χρησιμοποιηθεί στη διάγνωση, την πρόγνωση, ακόμα και την επιλογή της θεραπείας του καρκίνου. Σε αυτή την πτυχιακή θα αναλυθεί η θεωρία και οι εφαρμογές της Βαθειάς Μάθησης. Στη συνέχεια, θα εφαρμοστεί η Βαθειά Μάθηση και πιο συγκεκριμένα ένα Συνελικτικό Νευρωνικό Δίκτυο, ως μέσο για τη διάγνωση πολλαπλών τύπων καρκίνου (κατηγοριοποίηση καρκίνων) χρησιμοποιώντας δεδομένα έκφρασης γονιδίων, και πιο συγκεκριμένα αλληλουχίες RNA. Τα δεδομένα του «The Cancer Genome Atlas» (TCGA) αποτελούνται από αλληλουχίες RNA. Θα επεξεργαστούν σε πρώτο επίπεδο και μετά θα μετατραπούν σε πολλαπλές δισδιάστατες εικόνες. Οι εικόνες αυτές θα εισαχθούν σε ένα Συνελικτικό Νευρωνικό Δίκτυο, το οποίο θα τις κατηγοριοποιήσει σε 33 τύπους καρκίνου, αποσκοπώντας στην διάγνωση με τη μέγιστη δυνατή ακρίβεια.Gene expression analysis is the study of the way genes are transcribed to synthesize functional gene products, functional RNA species, or protein products. Its study can provide insights of cellular processes, such as cellular differentiation and abnormal pathological processes. Cancer is a genetic disease where genetic variations cause abnormally functioning genes that appear to alter expression. Proteins, being the final products of gene expression, define the phenotypes and biological processes. Therefore, detecting gene expression levels can be used for cancer diagnosis, prognosis, and even treatment prediction. This thesis will be analyzing the theory and applications of Deep Learning. It will then apply Deep Learning (DL) and in particular a Convolutional Neural Network (CNN) as a means for the diagnosis of multiple cancer types (pan-cancer classification) using gene expression data and specifically RNA-sequencing. The Cancer Genome Atlas (TCGA) data, which consists of RNA-sequencing, will be preprocessed and then embedded into multiple two-dimensional (2D) images. These images will then be applied to a Convolutional Neural Network which will classify them into 33 types of cancer, in an attempt to achieve the highest possible diagnosis accuracy
    corecore