22 research outputs found

    Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

    Get PDF
    Computational analysis of high-throughput omics data, such as gene expressions, copy number alterations and DNA methylation (DNAm), has become popular in disease studies in recent decades because such analyses can be very helpful to predict whether a patient has certain disease or its subtypes. However, due to the high-dimensional nature of the data sets with hundreds of thousands of variables and very small number of samples, traditional machine learning approaches, such as support vector machines (SVMs) and random forests, have limitations to analyze these data efficiently. In this chapter, we reviewed the progress in applying deep learning algorithms to solve some biological questions. The focus is on potential software tools and public data sources for the tasks. Particularly, we show some case studies using deep neural network (DNN) models for classifying molecular subtypes of breast cancer and DNN-based regression models to account for interindividual variation in triglyceride concentrations measured at different visits of peripheral blood samples using DNAm profiles. We show that integration of multi-omics profiles into DNN-based learning methods could improve the prediction of the molecular subtypes of breast cancer. We also demonstrate the superiority of our proposed DNN models over the SVM model for predicting triglyceride concentrations

    Deep Learning Models For Biomedical Data Analysis

    Get PDF
    The field of biomedical data analysis is a vibrant area of research dedicated to extracting valuable insights from a wide range of biomedical data sources, including biomedical images and genomics data. The emergence of deep learning, an artificial intelligence approach, presents significant prospects for enhancing biomedical data analysis and knowledge discovery. This dissertation focused on exploring innovative deep-learning methods for biomedical image processing and gene data analysis. During the COVID-19 pandemic, biomedical imaging data, including CT scans and chest x-rays, played a pivotal role in identifying COVID-19 cases by categorizing patient chest x-ray outcomes as COVID-19-positive or negative. While supervised deep learning methods have effectively recognized COVID-19 patterns in chest x-ray datasets, the availability of annotated training data remains limited. To address this challenge, the thesis introduced a semi-supervised deep learning model named ssResNet, built upon the Residual Neural Network (ResNet) architecture. The model combines supervised and unsupervised paths, incorporating a weighted supervised loss function to manage data imbalance. The strategies to diminish prediction uncertainty in deep learning models for critical applications like medical image processing is explore. It achieves this through an ensemble deep learning model, integrating bagging deep learning and model calibration techniques. This ensemble model not only boosts biomedical image segmentation accuracy but also reduces prediction uncertainty, as validated on a comprehensive chest x-ray image segmentation dataset. Furthermore, the thesis introduced an ensemble model integrating Proformer and ensemble learning methodologies. This model constructs multiple independent Proformers for predicting gene expression, their predictions are combined through weighted averaging to generate final predictions. Experimental outcomes underscore the efficacy of this ensemble model in enhancing prediction performance across various metrics. In conclusion, this dissertation advances biomedical data analysis by harnessing the potential of deep learning techniques. It devises innovative approaches for processing biomedical images and gene data. By leveraging deep learning\u27s capabilities, this work paves the way for further progress in biomedical data analytics and its applications within clinical contexts. Index Terms- biomedical data analysis, COVID-19, deep learning, ensemble learning, gene data analytics, medical image segmentation, prediction uncertainty, Proformer, Residual Neural Network (ResNet), semi-supervised learning

    Performance Evaluation of Intrusion Detection System using Selected Features and Machine Learning Classifiers

    Get PDF
    Some of the main challenges in developing an effective network-based intrusion detection system (IDS) include analyzing large network traffic volumes and realizing the decision boundaries between normal and abnormal behaviors. Deploying feature selection together with efficient classifiers in the detection system can overcome these problems.  Feature selection finds the most relevant features, thus reduces the dimensionality and complexity to analyze the network traffic.  Moreover, using the most relevant features to build the predictive model, reduces the complexity of the developed model, thus reducing the building classifier model time and consequently improves the detection performance.  In this study, two different sets of selected features have been adopted to train four machine-learning based classifiers.  The two sets of selected features are based on Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) approach respectively.  These evolutionary-based algorithms are known to be effective in solving optimization problems.  The classifiers used in this study are Naïve Bayes, k-Nearest Neighbor, Decision Tree and Support Vector Machine that have been trained and tested using the NSL-KDD dataset. The performance of the abovementioned classifiers using different features values was evaluated.  The experimental results indicate that the detection accuracy improves by approximately 1.55% when implemented using the PSO-based selected features than that of using GA-based selected features.  The Decision Tree classifier that was trained with PSO-based selected features outperformed other classifiers with accuracy, precision, recall, and f-score result of 99.38%, 99.36%, 99.32%, and 99.34% respectively.  The results show that using optimal features coupling with a good classifier in a detection system able to reduce the classifier model building time, reduce the computational burden to analyze data, and consequently attain high detection rate

    Machine Learning and Deep Learning Approaches for Brain Disease Diagnosis : Principles and Recent Advances

    Get PDF
    This work was supported in part by the National Research Foundation of Korea-Grant funded by the Korean Government (Ministry of Science and ICT) under Grant NRF 2020R1A2B5B02002478, and in part by Sejong University through its Faculty Research Program under Grant 20212023.Peer reviewedPublisher PD

    Statistical and deep learning methods for cancer genomic data.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Statistical and machine learning methods have been applied in broad domains including the medical field. These methods have a massive impact on healthcare by providing the support for decision making to the specialist in diagnosis and prognosis of patient disease status and disease progression. Non-communicable diseases (NCDs) remain a major challenge the world over in the 21st century, especially in developing countries where resources are limited. Recent global public health research shows an epidemiological paradigm shift from infection to non-communicable diseases, which include cancer. Cancer is considered the most devastating among all NCDs and is ranked second to malaria as the leading causes of death in the developing countries. Cancer occurs in many different types affecting all community members, where the general mechanism of cancer disease etiology is uncontrolled cells proliferation that leads to a malignant or cancerous tumor, and abnormalities at the molecular level. However, earlier detection and accurate diagnosis of cancer symptoms increase the probability of curing the condition, which has become the best strategy for fighting the disease. In the past few years, a vast amount of cancer data have been generated through new high throughput technologies. Traditional clinical and experimental approaches lack the capacity to handle such a massive scale of data. Therefore, computational methods have been introduced to biomedical investigations, including genes/biomarkers selection of cancer types and stages of the disease. Many computational tools have been developed based on different statistical and machine learning strategies and data science approaches. We used statistical, machine and deep learning methods for cancer types, subtypes, and survival prediction in this work. First, we developed a hybrid (DNA mutation and RNA expression) signature and assessed its predictive properties for colorectal cancer (CRC) patients’ mutation status and survival. In addition, we proposed a stacking ensemble deep learning approach to evaluate and compare its predictive performance for cancer types (as a multi-class classification problem) with the different standard machine and deep learning methods. Finally, we assessed the predictive performance of the Cox proportional hazard and random survival forests methods based on a signature obtained using three gene mutations (KRAS, BRAF, and TP53). However, the most significant limitation lies in the sample size being small, and there is a lack of using independent data for validation. Also, we did not consider different features such as methylation and mutation data. Moreover, it is unfortunate that the study did not include detailed simulation studies to compare the traditional statistical and machine learning methods. Overall, the most prominent finding to emerge from this investigation is that combining different data sources leads to more robust statistical significance. Also, the stacking approach is more reliable and promising compared to a single machine or deep learning. Furthermore, the RSF is a proper and striking method for survival analysis since it does not depend on any model assumptions

    Unveiling the frontiers of deep learning: innovations shaping diverse domains

    Full text link
    Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table

    Computational Methods for the Analysis of Genomic Data and Biological Processes

    Get PDF
    In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality

    Bioinformatics and Machine Learning for Cancer Biology

    Get PDF
    Cancer is a leading cause of death worldwide, claiming millions of lives each year. Cancer biology is an essential research field to understand how cancer develops, evolves, and responds to therapy. By taking advantage of a series of “omics” technologies (e.g., genomics, transcriptomics, and epigenomics), computational methods in bioinformatics and machine learning can help scientists and researchers to decipher the complexity of cancer heterogeneity, tumorigenesis, and anticancer drug discovery. Particularly, bioinformatics enables the systematic interrogation and analysis of cancer from various perspectives, including genetics, epigenetics, signaling networks, cellular behavior, clinical manifestation, and epidemiology. Moreover, thanks to the influx of next-generation sequencing (NGS) data in the postgenomic era and multiple landmark cancer-focused projects, such as The Cancer Genome Atlas (TCGA) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), machine learning has a uniquely advantageous role in boosting data-driven cancer research and unraveling novel methods for the prognosis, prediction, and treatment of cancer

    Machine Learning Strategies to Analyze Quantitative Ultrasound Multi-Parametric Images for Prediction of Therapy Response in Breast Cancer Patients

    Get PDF
    In this thesis project, two novel machine learning strategies were investigated to predict tumor response to neoadjuvant chemotherapy (NAC) at pre-treatment using quantitative ultrasound (QUS) multi-parametric images. The ultrasound data for analytical development and evaluation of the methodologies investigated in this project were acquired from 181 patients diagnosed with locally advanced breast cancer (LABC) and planned for NAC followed by surgery. The QUS multi-parametric images were generated using spectral analyses on the raw ultrasound radiofrequency (RF) data acquired before starting the NAC. In the first machine learning approach investigated in this project, distinct intra-tumor regions were identified within the parametric maps using a hidden Markov random field (HMRF) and its expectation-maximization (EM) algorithm. Several hand-crafted features characterizing the tumor, intra-tumor regions, and the tumor margin were extracted from different parametric images. A multi-step feature selection procedure was applied to construct a QUS biomarker for response prediction. Evaluation results on an independent test set indicated that the developed biomarker using the characteristics of intra-tumor regions and tumor margin in conjunction with a decision tree model with adaptive boosting (AdaBoost) as the classifier could predict the treatment response of patients at pre-treatment with an accuracy of 85.4% and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.89. In the second machine learning approach investigated in this project, two deep convolutional neural network (DCNN) architectures including the residual network (ResNet) and residual attention network (RAN) were explored for extracting optimal feature maps from the parametric images, with a fully connected network for response prediction. Results demonstrated that the developed model with the RAN architecture to extract feature maps from the expanded parametric images of the tumor core and margin had a superior performance with an accuracy of 0.88 and an AUC of 0.86 on the independent test set. Also, survival analysis demonstrated a statistically significant difference between survival curves of the two response cohorts identified at pre-treatment based on both the conventional machine learning method and the deep learning model. Obtained results in this study demonstrated a great promise of QUS multi-parametric imaging integrated with both unsupervised learning methods in identifying distinct breast cancer intra-tumor regions and traditional classification techniques, as well as deep convolutional neural networks in predicting tumor response to NAC prior to start of treatment

    Machine Learning Strategies to Analyze Quantitative Ultrasound Multi-Parametric Images for Prediction of Therapy Response in Breast Cancer Patients

    Get PDF
    In this thesis project, two novel machine learning strategies were investigated to predict tumor response to neoadjuvant chemotherapy (NAC) at pre-treatment using quantitative ultrasound (QUS) multi-parametric images. The ultrasound data for analytical development and evaluation of the methodologies investigated in this project were acquired from 181 patients diagnosed with locally advanced breast cancer (LABC) and planned for NAC followed by surgery. The QUS multi-parametric images were generated using spectral analyses on the raw ultrasound radiofrequency (RF) data acquired before starting the NAC. In the first machine learning approach investigated in this project, distinct intra-tumor regions were identified within the parametric maps using a hidden Markov random field (HMRF) and its expectation-maximization (EM) algorithm. Several hand-crafted features characterizing the tumor, intra-tumor regions, and the tumor margin were extracted from different parametric images. A multi-step feature selection procedure was applied to construct a QUS biomarker for response prediction. Evaluation results on an independent test set indicated that the developed biomarker using the characteristics of intra-tumor regions and tumor margin in conjunction with a decision tree model with adaptive boosting (AdaBoost) as the classifier could predict the treatment response of patients at pre-treatment with an accuracy of 85.4% and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.89. In the second machine learning approach investigated in this project, two deep convolutional neural network (DCNN) architectures including the residual network (ResNet) and residual attention network (RAN) were explored for extracting optimal feature maps from the parametric images, with a fully connected network for response prediction. Results demonstrated that the developed model with the RAN architecture to extract feature maps from the expanded parametric images of the tumor core and margin had a superior performance with an accuracy of 0.88 and an AUC of 0.86 on the independent test set. Also, survival analysis demonstrated a statistically significant difference between survival curves of the two response cohorts identified at pre-treatment based on both the conventional machine learning method and the deep learning model. Obtained results in this study demonstrated a great promise of QUS multi-parametric imaging integrated with both unsupervised learning methods in identifying distinct breast cancer intra-tumor regions and traditional classification techniques, as well as deep convolutional neural networks in predicting tumor response to NAC prior to start of treatment
    corecore