2,342 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Predictive genomics: A cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data

    Full text link
    We discuss a cancer hallmark network framework for modelling genome-sequencing data to predict cancer clonal evolution and associated clinical phenotypes. Strategies of using this framework in conjunction with genome sequencing data in an attempt to predict personalized drug targets, drug resistance, and metastasis for a cancer patient, as well as cancer risks for a healthy individual are discussed. Accurate prediction of cancer clonal evolution and clinical phenotypes will have substantial impact on timely diagnosis, personalized management and prevention of cancer.Comment: 5 figs, related papers, visit lab homepage: http://www.cancer-systemsbiology.org, Seminar in Cancer Biology, 201

    iSOM-GSN: An Integrative Approach for Transforming Multi-omic Data into Gene Similarity Networks via Self-organizing Maps

    Get PDF
    Deep learning models are currently applied in diverse domains, including image recognition, text generation, and event prediction. With the advent of new high-throughput sequencing technologies, a multitude of genomic data has been generated and made available. The representation of such data using deep neural networks, or for that matter, application of differential analysis has, however, not been able to match the growth of that data. One of the main challenges in applying convolutional neural networks on gene interaction data is the lack of understanding of the vector space domain to which they belong and also the inherent difficulties involved in representing those interactions on a significantly lower dimension viz Euclidean spaces. These challenges become more prevalent when dealing with various types of omics data with different forms. In this regard, we introduce a systematic, and generalized method, called iSOM-GSN, used to transform multi-omic genomic data with higher-dimensions into a two-dimensional grid. Afterwards, we apply a convolutional neural network (CNN) to predict disease states of various types. Based on the idea of the Kohonen\u27s self-organizing map (SOM), we generate a two-dimensional grid for each sample for a given set of genes that represent a gene similarity network (GSN). The set of genes that are significantly highly mutated across the whole genome, are related to each other based on functional interactions. We then test the model to predict breast and prostate cancer stages using gene expression, DNA methylation, and copy number alteration, yielding accuracies in the 94-98% range for tumor stages of breast cancer and calculated Gleason scores of prostate cancer with just 14 input genes for both cases. To our knowledge, this is the first attempt to use self-organizing maps and convolutional neural networks on integrating high-dimensional multi-omics data. The scheme not only outputs nearly perfect classification accuracy, but also provides an enhanced scheme for visualization, dimensionality reduction, and interpretation of the results

    Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

    Get PDF
    Computational analysis of high-throughput omics data, such as gene expressions, copy number alterations and DNA methylation (DNAm), has become popular in disease studies in recent decades because such analyses can be very helpful to predict whether a patient has certain disease or its subtypes. However, due to the high-dimensional nature of the data sets with hundreds of thousands of variables and very small number of samples, traditional machine learning approaches, such as support vector machines (SVMs) and random forests, have limitations to analyze these data efficiently. In this chapter, we reviewed the progress in applying deep learning algorithms to solve some biological questions. The focus is on potential software tools and public data sources for the tasks. Particularly, we show some case studies using deep neural network (DNN) models for classifying molecular subtypes of breast cancer and DNN-based regression models to account for interindividual variation in triglyceride concentrations measured at different visits of peripheral blood samples using DNAm profiles. We show that integration of multi-omics profiles into DNN-based learning methods could improve the prediction of the molecular subtypes of breast cancer. We also demonstrate the superiority of our proposed DNN models over the SVM model for predicting triglyceride concentrations

    A Deep Learning Approach to Integrate Medical Big Data for Improving Health Services in Indonesia

    Get PDF
    Medical Informatics to support health services in Indonesia is proposed in this paper. The focuses of paper to the analysis of Big Data for health care purposes with the aim of improving and developing clinical decision support systems (CDSS) or assessing medical data both for quality assurance and accessibility of health services. Electronic health records (EHR) are very rich in medical data sourced from patient. All the data can be aggregated to produce information, which includes medical history details such as, diagnostic tests, medicines and treatment plans, immunization records, allergies, radiological images, multivariate sensors device, laboratories, and test results. All the information will provide a valuable understanding of disease management system. In Indonesia country, with many rural areas with limited doctor it is an important case to investigate. Data mining about large-scale individuals and populations through EHRs can be combined with mobile networks and social media to inform about health and public policy. To support this research, many researchers have been applied the Deep Learning (DL) approach in data-mining problems related to health informatics. However, in practice, the use of DL is still questionable due to achieve optimal performance, relatively large data and resources are needed, given there are other learning algorithms that are relatively fast but produce close performance with fewer resources and parameterization, and have a better interpretability. In this paper, the advantage of Deep Learning to design medical informatics is described, due to such an approach is needed to make a good CDSS of health services

    Identifying Cancer Subtypes Using Unsupervised Deep Learning

    Get PDF
    Glioblastoma multiforme (GBM) is the most fatal malignant type of brain tumor with a very poor prognosis with a median survival of around one year. Numerous studies have reported tumor subtypes that consider different characteristics on individual patients, which may play important roles in determining the survival rates in GBM. In this study, we present a pathway-based clustering method using Restricted Boltzmann Machine (RBM), called R-PathCluster, for identifying unknown subtypes with pathway markers of gene expressions. In order to assess the performance of R-PathCluster, we conducted experiments with several clustering methods such as k-means, hierarchical clustering, and RBM models with different input data. R-PathCluster showed the best performance in clustering longterm and short-term survivals, although its clustering score was not the highest among them in experiments. R-PathCluster provides a solution to interpret the model in biological sense, since it takes pathway markers that represent biological process of pathways. We discussed that our findings from R-PathCluster are supported by many biological literatures. Keywords. Glioblastoma multiforme, tumor subtypes, clustering, Restricted Boltzmann Machin

    The Effective Quantitative Analysis for Brain Tumor Diagnosis Using an Efficient Deep Learning Algorithm

    Get PDF
    In the medical field, imaging analysis is the hottest topic. It has attracted many researchers to accurately analyses the disease severity and predict the outcome. However, if the trained images are more complex, the noise pruning results have decreased, which has tended to gain less prediction exactness score. So, a novel Chimp-based Boosting Multilayer Perceptron (CbBMP) prediction framework has been built in this present study. Moreover, the objective of this study is brain tumor prediction and severity analysis from the MRI brain images. The boosting function is employed to earn the most acceptable error pruning outcome. Henceforth, the feature analysis and the tumor prediction process were executed accurately with the help chimp solution function. The planned framework is tested in the MATLAB environment, and the prediction improvement score is analyzed by performing a comparative analysis. A novel CbBMP model has recorded the finest tumor forecasting rate

    Integrated Analysis of Gene Expression, CpG Island Methylation, and Gene Copy Number in Breast Cancer Cells by Deep Sequencing

    Get PDF
    We used deep sequencing technology to profile the transcriptome, gene copy number, and CpG island methylation status simultaneously in eight commonly used breast cell lines to develop a model for how these genomic features are integrated in estrogen receptor positive (ER+) and negative breast cancer. Total mRNA sequence, gene copy number, and genomic CpG island methylation were carried out using the Illumina Genome Analyzer. Sequences were mapped to the human genome to obtain digitized gene expression data, DNA copy number in reference to the non-tumor cell line (MCF10A), and methylation status of 21,570 CpG islands to identify differentially expressed genes that were correlated with methylation or copy number changes. These were evaluated in a dataset from 129 primary breast tumors. Gene expression in cell lines was dominated by ER-associated genes. ER+ and ER− cell lines formed two distinct, stable clusters, and 1,873 genes were differentially expressed in the two groups. Part of chromosome 8 was deleted in all ER− cells and part of chromosome 17 amplified in all ER+ cells. These loci encoded 30 genes that were overexpressed in ER+ cells; 9 of these genes were overexpressed in ER+ tumors. We identified 149 differentially expressed genes that exhibited differential methylation of one or more CpG islands within 5 kb of the 5′ end of the gene and for which mRNA abundance was inversely correlated with CpG island methylation status. In primary tumors we identified 84 genes that appear to be robust components of the methylation signature that we identified in ER+ cell lines. Our analyses reveal a global pattern of differential CpG island methylation that contributes to the transcriptome landscape of ER+ and ER− breast cancer cells and tumors. The role of gene amplification/deletion appears to more modest, although several potentially significant genes appear to be regulated by copy number aberrations
    • …
    corecore