88 research outputs found

    OmiEmbed: a unified multi-task deep learning framework for multi-omics data

    Full text link
    High-dimensional omics data contains intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data due to the large number of molecular features and small number of available samples, which is also called 'the curse of dimensionality' in machine learning. To tackle this problem and pave the way for machine learning aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed support multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy comparing to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various application of high-dimensional omics data and has a great potential to facilitate more accurate and personalised clinical decision making.Comment: 14 pages, 8 figures, 7 table

    Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

    Full text link
    Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The "dimensionality curse" makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.Comment: 7 pages, 4 figure

    Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records

    Full text link
    The extraction of phenotype information which is naturally contained in electronic health records (EHRs) has been found to be useful in various clinical informatics applications such as disease diagnosis. However, due to imprecise descriptions, lack of gold standards and the demand for efficiency, annotating phenotypic abnormalities on millions of EHR narratives is still challenging. In this work, we propose a novel unsupervised deep learning framework to annotate the phenotypic abnormalities from EHRs via semantic latent representations. The proposed framework takes the advantage of Human Phenotype Ontology (HPO), which is a knowledge base of phenotypic abnormalities, to standardize the annotation results. Experiments have been conducted on 52,722 EHRs from MIMIC-III dataset. Quantitative and qualitative analysis have shown the proposed framework achieves state-of-the-art annotation performance and computational efficiency compared with other methods.Comment: Accepted by BIBM 2019 (Regular

    Correspondence between the Korean and Mandarin Chinese pronunciations of Chinese characters: A comparison at the sub-syllabic level

    Get PDF
    This study explores the corresponding relationship of Chinese characters’ pronunciations between modern Mandarin Chinese and modern Korean at the subsyllabic level and investigates the applicability of such correspondence in learning and reading Korean as a second language (L2) by native (L1) Mandarin Chinese speakers. Correspondence between Korean and Mandarin Chinese initial consonants and that between Korean -V(C) structures and Chinese finals were calculated based on the 1,800 Chinese characters for educational purposes in South Korea. Our results demonstrated that Korean initial consonants had either consistent or inconsistent correspondence with their Mandarin Chinese counterparts. In addition, this study proved that pure comparisons of vowels between the two languages are not reliable. Instead, the comparison between Korean -V(C) structures and Chines finals could be more practical. Ninety percent of the high frequency Chinese characters in Korean can be inferred to corresponding Chinese pronunciations based on the data provided in this study

    Theoretical foundations of studying criticality in the brain

    Full text link
    Criticality is hypothesized as a physical mechanism underlying efficient transitions between cortical states and remarkable information processing capacities in the brain. While considerable evidence generally supports this hypothesis, non-negligible controversies persist regarding the ubiquity of criticality in neural dynamics and its role in information processing. Validity issues frequently arise during identifying potential brain criticality from empirical data. Moreover, the functional benefits implied by brain criticality are frequently misconceived or unduly generalized. These problems stem from the non-triviality and immaturity of the physical theories that analytically derive brain criticality and the statistic techniques that estimate brain criticality from empirical data. To help solve these problems, we present a systematic review and reformulate the foundations of studying brain criticality, i.e., ordinary criticality (OC), quasi-criticality (qC), self-organized criticality (SOC), and self-organized quasi-criticality (SOqC), using the terminology of neuroscience. We offer accessible explanations of the physical theories and statistic techniques of brain criticality, providing step-by-step derivations to characterize neural dynamics as a physical system with avalanches. We summarize error-prone details and existing limitations in brain criticality analysis and suggest possible solutions. Moreover, we present a forward-looking perspective on how optimizing the foundations of studying brain criticality can deepen our understanding of various neuroscience questions

    Toward Learning Model-Agnostic Explanations for Deep Learning-Based Signal Modulation Classifiers

    Get PDF
    Recent advances in deep learning (DL) have brought tremendous gains in signal modulation classification. However, DL-based classifiers lack transparency and interpretability, which raises concern about model's reliability and hinders the wide deployment in real-word applications. While explainable methods have recently emerged, little has been done to explain the DL-based signal modulation classifiers. In this work, we propose a novel model-agnostic explainer, Model-Agnostic Signal modulation classification Explainer (MASE), which provides explanations for the predictions of black-box modulation classifiers. With the subsequence-based signal interpretable representation and in-distribution local signal sampling, MASE learns a local linear surrogate model to derive a class activation vector, which assigns importance values to the timesteps of signal instance. Besides, the constellation-based explanation visualization is adopted to spotlight the important signal features relevant to model prediction. We furthermore propose the first generic quantitative explanation evaluation framework for signal modulation classification to automatically measure the faithfulness, sensitivity, robustness, and efficiency of explanations. Extensive experiments are conducted on two real-world datasets with four black-box signal modulation classifiers. The quantitative results indicate MASE outperforms two state-of-the-art methods with 44.7% improvement in faithfulness, 30.6% improvement in robustness, and 44.1% decrease in sensitivity. Through qualitative visualizations, we further demonstrate the explanations of MASE are more human interpretable and provide better understanding into the reliability of black-box model decisions

    Nitrogen removal in freshwater sediments of riparian zone: N-loss pathways and environmental controls

    Get PDF
    The riparian zone is an important location of nitrogen removal in the terrestrial and aquatic ecosystems. Many studies have focused on the nitrogen removal efficiency and one or two nitrogen removal processes in the riparian zone, and less attention has been paid to the interaction of different nitrogen transformation processes and the impact of in situ environmental conditions. The molecular biotechnology, microcosm culture experiments and 15N stable isotope tracing techniques were used in this research at the riparian zone in Weinan section of the Wei River, to reveal the nitrogen removal mechanism of riparian zone with multi-layer lithologic structure. The results showed that the nitrogen removal rate in the riparian zone was 4.14–35.19 μmol·N·kg−1·h−1. Denitrification, dissimilatory reduction to ammonium (DNRA) and anaerobic ammonium oxidation (anammox) jointly achieved the natural attenuation process of nitrogen in the riparian zone, and denitrification was the dominant process (accounting for 59.6%). High dissolved organic nitrogen and nitrate ratio (DOC:NO3−) would promote denitrification, but when the NO3− content was less than 0.06 mg/kg, DNRA would occur in preference to denitrification. Furthermore, the abundances of functional genes (norB, nirS, nrfA) and anammox bacterial 16S rRNA gene showed similar distribution patterns with the corresponding nitrogen transformation rates. Sedimentary NOX−, Fe(II), dissolved organic carbon (DOC) and the nitrogen transformation functional microbial abundance were the main factors affecting nitrogen removal in the riparian zone. Fe (II) promoted NO3− attenuation through nitrate dependent ferrous oxidation process under microbial mediation, and DOC promotes NO3− attenuation through enhancing DNRA effect. The results of this study can be used for the management of the riparian zone and the prevention and control of global nitrogen pollution
    • …
    corecore