88 research outputs found
OmiEmbed: a unified multi-task deep learning framework for multi-omics data
High-dimensional omics data contains intrinsic biomedical information that is
crucial for personalised medicine. Nevertheless, it is challenging to capture
them from the genome-wide data due to the large number of molecular features
and small number of available samples, which is also called 'the curse of
dimensionality' in machine learning. To tackle this problem and pave the way
for machine learning aided precision medicine, we proposed a unified multi-task
deep learning framework named OmiEmbed to capture biomedical information from
high-dimensional omics data with the deep embedding and downstream task
modules. The deep embedding module learnt an omics embedding that mapped
multiple omics data types into a latent space with lower dimensionality. Based
on the new representation of multi-omics data, different downstream task
modules were trained simultaneously and efficiently with the multi-task
strategy to predict the comprehensive phenotype profile of each sample.
OmiEmbed support multiple tasks for omics data including dimensionality
reduction, tumour type classification, multi-omics integration, demographic and
clinical feature reconstruction, and survival prediction. The framework
outperformed other methods on all three types of downstream tasks and achieved
better performance with the multi-task strategy comparing to training them
individually. OmiEmbed is a powerful and unified framework that can be widely
adapted to various application of high-dimensional omics data and has a great
potential to facilitate more accurate and personalised clinical decision
making.Comment: 14 pages, 8 figures, 7 table
Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification
Different aspects of a clinical sample can be revealed by multiple types of
omics data. Integrated analysis of multi-omics data provides a comprehensive
view of patients, which has the potential to facilitate more accurate clinical
decision making. However, omics data are normally high dimensional with large
number of molecular features and relatively small number of available samples
with clinical labels. The "dimensionality curse" makes it challenging to train
a machine learning model using high dimensional omics data like DNA methylation
and gene expression profiles. Here we propose an end-to-end deep learning model
called OmiVAE to extract low dimensional features and classify samples from
multi-omics data. OmiVAE combines the basic structure of variational
autoencoders with a classification network to achieve task-oriented feature
extraction and multi-class classification. The training procedure of OmiVAE is
comprised of an unsupervised phase without the classifier and a supervised
phase with the classifier. During the unsupervised phase, a hierarchical
cluster structure of samples can be automatically formed without the need for
labels. And in the supervised phase, OmiVAE achieved an average classification
accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and
normal samples, which shows better performance than other existing methods. The
OmiVAE model learned from multi-omics data outperformed that using only one
type of omics data, which indicates that the complementary information from
different omics datatypes provides useful insights for biomedical tasks like
cancer classification.Comment: 7 pages, 4 figure
Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records
The extraction of phenotype information which is naturally contained in
electronic health records (EHRs) has been found to be useful in various
clinical informatics applications such as disease diagnosis. However, due to
imprecise descriptions, lack of gold standards and the demand for efficiency,
annotating phenotypic abnormalities on millions of EHR narratives is still
challenging. In this work, we propose a novel unsupervised deep learning
framework to annotate the phenotypic abnormalities from EHRs via semantic
latent representations. The proposed framework takes the advantage of Human
Phenotype Ontology (HPO), which is a knowledge base of phenotypic
abnormalities, to standardize the annotation results. Experiments have been
conducted on 52,722 EHRs from MIMIC-III dataset. Quantitative and qualitative
analysis have shown the proposed framework achieves state-of-the-art annotation
performance and computational efficiency compared with other methods.Comment: Accepted by BIBM 2019 (Regular
Correspondence between the Korean and Mandarin Chinese pronunciations of Chinese characters: A comparison at the sub-syllabic level
This study explores the corresponding relationship of Chinese characters’ pronunciations between modern Mandarin Chinese and modern Korean at the subsyllabic level and investigates the applicability of such correspondence in learning and reading Korean as a second language (L2) by native (L1) Mandarin Chinese speakers. Correspondence between Korean and Mandarin Chinese initial consonants and that between Korean -V(C) structures and Chinese finals were calculated based on the 1,800 Chinese characters for educational purposes in South Korea. Our results demonstrated that Korean initial consonants had either consistent or inconsistent correspondence with their Mandarin Chinese counterparts. In addition, this study proved that pure comparisons of vowels between the two languages are not reliable. Instead, the comparison between Korean -V(C) structures and Chines finals could be more practical. Ninety percent of the high frequency Chinese characters in Korean can be inferred to corresponding Chinese pronunciations based on the data provided in this study
Theoretical foundations of studying criticality in the brain
Criticality is hypothesized as a physical mechanism underlying efficient
transitions between cortical states and remarkable information processing
capacities in the brain. While considerable evidence generally supports this
hypothesis, non-negligible controversies persist regarding the ubiquity of
criticality in neural dynamics and its role in information processing. Validity
issues frequently arise during identifying potential brain criticality from
empirical data. Moreover, the functional benefits implied by brain criticality
are frequently misconceived or unduly generalized. These problems stem from the
non-triviality and immaturity of the physical theories that analytically derive
brain criticality and the statistic techniques that estimate brain criticality
from empirical data. To help solve these problems, we present a systematic
review and reformulate the foundations of studying brain criticality, i.e.,
ordinary criticality (OC), quasi-criticality (qC), self-organized criticality
(SOC), and self-organized quasi-criticality (SOqC), using the terminology of
neuroscience. We offer accessible explanations of the physical theories and
statistic techniques of brain criticality, providing step-by-step derivations
to characterize neural dynamics as a physical system with avalanches. We
summarize error-prone details and existing limitations in brain criticality
analysis and suggest possible solutions. Moreover, we present a forward-looking
perspective on how optimizing the foundations of studying brain criticality can
deepen our understanding of various neuroscience questions
Toward Learning Model-Agnostic Explanations for Deep Learning-Based Signal Modulation Classifiers
Recent advances in deep learning (DL) have brought tremendous gains in signal modulation classification. However, DL-based classifiers lack transparency and interpretability, which raises concern about model's reliability and hinders the wide deployment in real-word applications. While explainable methods have recently emerged, little has been done to explain the DL-based signal modulation classifiers. In this work, we propose a novel model-agnostic explainer, Model-Agnostic Signal modulation classification Explainer (MASE), which provides explanations for the predictions of black-box modulation classifiers. With the subsequence-based signal interpretable representation and in-distribution local signal sampling, MASE learns a local linear surrogate model to derive a class activation vector, which assigns importance values to the timesteps of signal instance. Besides, the constellation-based explanation visualization is adopted to spotlight the important signal features relevant to model prediction. We furthermore propose the first generic quantitative explanation evaluation framework for signal modulation classification to automatically measure the faithfulness, sensitivity, robustness, and efficiency of explanations. Extensive experiments are conducted on two real-world datasets with four black-box signal modulation classifiers. The quantitative results indicate MASE outperforms two state-of-the-art methods with 44.7% improvement in faithfulness, 30.6% improvement in robustness, and 44.1% decrease in sensitivity. Through qualitative visualizations, we further demonstrate the explanations of MASE are more human interpretable and provide better understanding into the reliability of black-box model decisions
Nitrogen removal in freshwater sediments of riparian zone: N-loss pathways and environmental controls
The riparian zone is an important location of nitrogen removal in the terrestrial and aquatic ecosystems. Many studies have focused on the nitrogen removal efficiency and one or two nitrogen removal processes in the riparian zone, and less attention has been paid to the interaction of different nitrogen transformation processes and the impact of in situ environmental conditions. The molecular biotechnology, microcosm culture experiments and 15N stable isotope tracing techniques were used in this research at the riparian zone in Weinan section of the Wei River, to reveal the nitrogen removal mechanism of riparian zone with multi-layer lithologic structure. The results showed that the nitrogen removal rate in the riparian zone was 4.14–35.19 μmol·N·kg−1·h−1. Denitrification, dissimilatory reduction to ammonium (DNRA) and anaerobic ammonium oxidation (anammox) jointly achieved the natural attenuation process of nitrogen in the riparian zone, and denitrification was the dominant process (accounting for 59.6%). High dissolved organic nitrogen and nitrate ratio (DOC:NO3−) would promote denitrification, but when the NO3− content was less than 0.06 mg/kg, DNRA would occur in preference to denitrification. Furthermore, the abundances of functional genes (norB, nirS, nrfA) and anammox bacterial 16S rRNA gene showed similar distribution patterns with the corresponding nitrogen transformation rates. Sedimentary NOX−, Fe(II), dissolved organic carbon (DOC) and the nitrogen transformation functional microbial abundance were the main factors affecting nitrogen removal in the riparian zone. Fe (II) promoted NO3− attenuation through nitrate dependent ferrous oxidation process under microbial mediation, and DOC promotes NO3− attenuation through enhancing DNRA effect. The results of this study can be used for the management of the riparian zone and the prevention and control of global nitrogen pollution
- …