11 research outputs found
An Enhancement to CNN Approach with Synthesized Image Data for Disease Subtype Classification
The introduction of genetic testing has profoundly enhanced the prospects of early detection of diseases and techniques to suggest precision medicines. The subtyping of critical diseases has proven to be an essential part of the development of individualized therapies and has led to deeper insights into the heterogeneity of the disease. Studies suggest that variants in particular genes have significant effects on certain types of immune system cells and are also involved in the risk of certain critical illnesses like cancer. By analyzing the genetic sequence of a patient, disease types and subtypes can be predicted. Recent research work has shown that the CNN\u27s prediction quality within this context using gene intensity features could be improved when the input is structured into 2D images.
Constructed from chromosome locations or from transformations involving kPCA, t-SNE, etc., these two-dimensional images express certain types of relationships among the intensity features. While this approach extends the success of convolutional neural networks to non-image data, getting a precise mapping of features on the images to reflect the relationship among the features is hard, if not impossible. To this end, we propose an enhancement to the approach by providing the CNN training procedure with not only the samples of the structured image data but also the samples from the unstructured raw gene expression data in its original form. While the former is fed into the convolutional layers in the network, the latter is input only to the fully connected layers of the network. The proposed method is applied to The Cancer Genome Atlas (TCGA) dataset for cancer subtypes with the median values of the expression level of all expressed genes in an RNA sequence. According to the experiments, our proposed approach can improve the classification accuracy by 2.7% when it is applied to the state-of-the-art method with 2D CNN architecture trained using images that are constructed based on chromosome locations of the genes. When built on top of the method with 2D CNN architecture trained using images that are constructed with transformation process involving t-SNE, classification accuracy is enhanced by 4.7%. For the implementation of the proposed approach on the 1D CNN model using the data structured using covariance between the features, the classification accuracy is improved by 1% and an increase of 3% is observed when the approach is implemented over the model trained using 1D CNN with data ordered based on chromosome locations
Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification
Different aspects of a clinical sample can be revealed by multiple types of
omics data. Integrated analysis of multi-omics data provides a comprehensive
view of patients, which has the potential to facilitate more accurate clinical
decision making. However, omics data are normally high dimensional with large
number of molecular features and relatively small number of available samples
with clinical labels. The "dimensionality curse" makes it challenging to train
a machine learning model using high dimensional omics data like DNA methylation
and gene expression profiles. Here we propose an end-to-end deep learning model
called OmiVAE to extract low dimensional features and classify samples from
multi-omics data. OmiVAE combines the basic structure of variational
autoencoders with a classification network to achieve task-oriented feature
extraction and multi-class classification. The training procedure of OmiVAE is
comprised of an unsupervised phase without the classifier and a supervised
phase with the classifier. During the unsupervised phase, a hierarchical
cluster structure of samples can be automatically formed without the need for
labels. And in the supervised phase, OmiVAE achieved an average classification
accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and
normal samples, which shows better performance than other existing methods. The
OmiVAE model learned from multi-omics data outperformed that using only one
type of omics data, which indicates that the complementary information from
different omics datatypes provides useful insights for biomedical tasks like
cancer classification.Comment: 7 pages, 4 figure
OmiEmbed: a unified multi-task deep learning framework for multi-omics data
High-dimensional omics data contains intrinsic biomedical information that is
crucial for personalised medicine. Nevertheless, it is challenging to capture
them from the genome-wide data due to the large number of molecular features
and small number of available samples, which is also called 'the curse of
dimensionality' in machine learning. To tackle this problem and pave the way
for machine learning aided precision medicine, we proposed a unified multi-task
deep learning framework named OmiEmbed to capture biomedical information from
high-dimensional omics data with the deep embedding and downstream task
modules. The deep embedding module learnt an omics embedding that mapped
multiple omics data types into a latent space with lower dimensionality. Based
on the new representation of multi-omics data, different downstream task
modules were trained simultaneously and efficiently with the multi-task
strategy to predict the comprehensive phenotype profile of each sample.
OmiEmbed support multiple tasks for omics data including dimensionality
reduction, tumour type classification, multi-omics integration, demographic and
clinical feature reconstruction, and survival prediction. The framework
outperformed other methods on all three types of downstream tasks and achieved
better performance with the multi-task strategy comparing to training them
individually. OmiEmbed is a powerful and unified framework that can be widely
adapted to various application of high-dimensional omics data and has a great
potential to facilitate more accurate and personalised clinical decision
making.Comment: 14 pages, 8 figures, 7 table
DeepInsight: a methodology to transform a non - image data to an image for convolution neural network architecture
It is critical, but difficult, to catch the small variation in genomic or other kinds of data that differentiates phenotypes or categories. A plethora of data is available, but the information from its genes or elements is spread over arbitrarily, making it challenging to extract relevant details for identification. However, an arrangement of similar genes into clusters makes these differences more accessible and allows for robust identification of hidden mechanisms (e.g. pathways) than dealing with elements individually. Here we propose, DeepInsight, which converts non-image samples into a well-organized image-form. Thereby, the power of convolution neural network (CNN), including GPU utilization, can be realized for non-image samples. Furthermore, DeepInsight enables feature extraction through the application of CNN for non-image samples to seize imperative information and shown promising results. To our knowledge, this is the first work to apply CNN simultaneously on different kinds of non-image datasets: RNA-seq, vowels, text, and artificial
Estimates of gene ensemble noise highlight critical pathways and predict disease severity in H1N1, COVID-19 and mortality in sepsis patients
Finding novel biomarkers for human pathologies and predicting clinical outcomes for patients is challenging. This stems from the heterogeneous response of individuals to disease and is reflected in the inter-individual variability of gene expression responses that obscures differential gene expression analysis. Here, we developed an alternative approach that could be applied to dissect the disease-associated molecular changes. We define gene ensemble noise as a measure that represents a variance for a collection of genes encoding for either members of known biological pathways or subunits of annotated protein complexes and calculated within an individual. The gene ensemble noise allows for the holistic identification and interpretation of gene expression disbalance on the level of gene networks and systems. By comparing gene expression data from COVID-19, H1N1, and sepsis patients we identified common disturbances in a number of pathways and protein complexes relevant to the sepsis pathology. Among others, these include the mitochondrial respiratory chain complex I and peroxisomes. This suggests a Warburg effect and oxidative stress as common hallmarks of the immune host-pathogen response. Finally, we showed that gene ensemble noise could successfully be applied for the prediction of clinical outcome namely, the mortality of patients. Thus, we conclude that gene ensemble noise represents a promising approach for the investigation of molecular mechanisms of pathology through a prism of alterations in the coherent expression of gene circuits
Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A review
Cancer remains one of the most challenging diseases to treat in the medical
field. Machine learning has enabled in-depth analysis of rich multi-omics
profiles and medical imaging for cancer diagnosis and prognosis. Despite these
advancements, machine learning models face challenges stemming from limited
labeled sample sizes, the intricate interplay of high-dimensionality data
types, the inherent heterogeneity observed among patients and within tumors,
and concerns about interpretability and consistency with existing biomedical
knowledge. One approach to surmount these challenges is to integrate biomedical
knowledge into data-driven models, which has proven potential to improve the
accuracy, robustness, and interpretability of model results. Here, we review
the state-of-the-art machine learning studies that adopted the fusion of
biomedical knowledge and data, termed knowledge-informed machine learning, for
cancer diagnosis and prognosis. Emphasizing the properties inherent in four
primary data types including clinical, imaging, molecular, and treatment data,
we highlight modeling considerations relevant to these contexts. We provide an
overview of diverse forms of knowledge representation and current strategies of
knowledge integration into machine learning pipelines with concrete examples.
We conclude the review article by discussing future directions to advance
cancer research through knowledge-informed machine learning.Comment: 41 pages, 4 figures, 2 table
AI and precision oncology in clinical cancer genomics : from prevention to targeted cancer therapies-an outcomes based patient care
Precision medicine is the personalization of medicine to suit a specific group of people or even an individual patient, based on genetic or molecular profiling. This can be done using genomic, transcriptomic, epigenomic or proteomic information. Personalized medicine holds great promise, especially in cancer therapy and control, where precision oncology would allow medical practitioners to use this information to optimize the treatment of a patient. Personalized oncology for groups of individuals would also allow for the use of population group specific diagnostic or prognostic biomarkers. Additionally, this information can be used to track the progress of the disease or monitor the response of the patient to treatment. This can be used to establish the molecular basis for drug resistance and allow the targeting of the genes or pathways responsible for drug resistance. Personalized medicine requires the use of large data sets, which must be processed and analysed in order to identify the particular molecular patterns that can inform the decisions required for personalized care. However, the analysis of these large data sets is difficult and time consuming. This is further compounded by the increasing size of these datasets due to technologies such as next generation sequencing (NGS). These difficulties can be met through the use of artificial intelligence (AI) and machine learning (ML). These computational tools use specific neural networks, learning methods, decision making tools and algorithms to construct and improve on models for the analysis of different types of large data sets. These tools can also be used to answer specific questions. Artificial intelligence can also be used to predict the effects of genetic changes on protein structure and therefore function. This review will discuss the current state of the application of AI to omics data, specifically genomic data, and how this is applied to the development of personalized or precision medicine on the treatment of cancer.The South African Medical Research Council (SAMRC) and the National Research Foundation (NRF).https://www.elsevier.com/locate/imuhj2023Anatomical PathologyMaxillo-Facial and Oral SurgeryMedical OncologyOtorhinolaryngologyRadiologySurgeryUrolog
Convolutional Neural Networks and their Application in Cancer Diagnosis based on RNA-Sequencing
Η έκφραση γονιδίων αποτελεί τη μελέτη της λειτουργίας της γονιδιακής μεταγραφής,
κατά την οποία συνθέτονται γονιδιακά προϊόντα, είδη RNA ή πρωτεΐνες. Η μελέτη της
παρέχει την κατανόηση των κυτταρικών λειτουργιών, όπως η κυτταρική διαφοροποίηση
και οι μη φυσιολογικές παθολογικές λειτουργίες.
Ο καρκίνος αποτελεί μία γενετική ασθένεια όπου γενετικές παραλλαγές προκαλούν μη
φυσιολογικές λειτουργίες στα γονίδια και τροποποιούν την έκφραση τους. Οι πρωτεΐνες,
οι οποίες αποτελούν το τελικό αποτέλεσμα της έκφρασης γονιδίων, καθορίζουν τους
φαινοτύπους και τις βιολογικές λειτουργίες. Συνεπώς, η ανίχνευση των επιπέδων
έκφρασης γονιδίων δύναται να χρησιμοποιηθεί στη διάγνωση, την πρόγνωση, ακόμα
και την επιλογή της θεραπείας του καρκίνου.
Σε αυτή την πτυχιακή θα αναλυθεί η θεωρία και οι εφαρμογές της Βαθειάς Μάθησης.
Στη συνέχεια, θα εφαρμοστεί η Βαθειά Μάθηση και πιο συγκεκριμένα ένα Συνελικτικό
Νευρωνικό Δίκτυο, ως μέσο για τη διάγνωση πολλαπλών τύπων καρκίνου
(κατηγοριοποίηση καρκίνων) χρησιμοποιώντας δεδομένα έκφρασης γονιδίων, και πιο
συγκεκριμένα αλληλουχίες RNA.
Τα δεδομένα του «The Cancer Genome Atlas» (TCGA) αποτελούνται από αλληλουχίες
RNA. Θα επεξεργαστούν σε πρώτο επίπεδο και μετά θα μετατραπούν σε πολλαπλές
δισδιάστατες εικόνες. Οι εικόνες αυτές θα εισαχθούν σε ένα Συνελικτικό Νευρωνικό
Δίκτυο, το οποίο θα τις κατηγοριοποιήσει σε 33 τύπους καρκίνου, αποσκοπώντας στην
διάγνωση με τη μέγιστη δυνατή ακρίβεια.Gene expression analysis is the study of the way genes are transcribed to synthesize
functional gene products, functional RNA species, or protein products. Its study can
provide insights of cellular processes, such as cellular differentiation and abnormal
pathological processes.
Cancer is a genetic disease where genetic variations cause abnormally functioning
genes that appear to alter expression. Proteins, being the final products of gene
expression, define the phenotypes and biological processes. Therefore, detecting gene
expression levels can be used for cancer diagnosis, prognosis, and even treatment
prediction.
This thesis will be analyzing the theory and applications of Deep Learning. It will then
apply Deep Learning (DL) and in particular a Convolutional Neural Network (CNN) as a
means for the diagnosis of multiple cancer types (pan-cancer classification) using gene
expression data and specifically RNA-sequencing.
The Cancer Genome Atlas (TCGA) data, which consists of RNA-sequencing, will be
preprocessed and then embedded into multiple two-dimensional (2D) images. These
images will then be applied to a Convolutional Neural Network which will classify them
into 33 types of cancer, in an attempt to achieve the highest possible diagnosis
accuracy