399 research outputs found
Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
Insufficient or even unavailable training data of emerging classes is a big
challenge of many classification tasks, including text classification.
Recognising text documents of classes that have never been seen in the learning
stage, so-called zero-shot text classification, is therefore difficult and only
limited previous works tackled this problem. In this paper, we propose a
two-phase framework together with data augmentation and feature augmentation to
solve this problem. Four kinds of semantic knowledge (word embeddings, class
descriptions, class hierarchy, and a general knowledge graph) are incorporated
into the proposed framework to deal with instances of unseen classes
effectively. Experimental results show that each and the combination of the two
phases achieve the best overall accuracy compared with baselines and recent
approaches in classifying real-world texts under the zero-shot scenario.Comment: Accepted NAACL-HLT 201
Language modelling for clinical natural language understanding and generation
One of the long-standing objectives of Artificial Intelligence (AI) is to design and develop algorithms for social good including tackling public health challenges. In the era of digitisation, with an unprecedented amount of healthcare data being captured in digital form, the analysis of the healthcare data at scale can lead to better research of diseases, better monitoring patient conditions and more importantly improving patient outcomes. However, many AI-based analytic algorithms rely solely on structured healthcare data such as bedside measurements and test results which only account for 20% of all healthcare data, whereas the remaining 80% of healthcare data is unstructured including textual data such as clinical notes and discharge summaries which is still underexplored.
Conventional Natural Language Processing (NLP) algorithms that are designed for clinical applications rely on the shallow matching, templates and non-contextualised word embeddings which lead to limited understanding of contextual semantics. Though recent advances in NLP algorithms have demonstrated promising performance on a variety of NLP tasks in the general domain with contextualised language models, most of these generic NLP algorithms struggle at specific clinical NLP tasks which require biomedical knowledge and reasoning. Besides, there is limited research to study generative NLP algorithms to generate clinical reports and summaries automatically by considering salient clinical information.
This thesis aims to design and develop novel NLP algorithms especially clinical-driven contextualised language models to understand textual healthcare data and generate clinical narratives which can potentially support clinicians, medical scientists and patients. The first contribution of this thesis focuses on capturing phenotypic information of patients from clinical notes which is important to profile patient situation and improve patient outcomes. The thesis proposes a novel self-supervised language model, named Phenotypic Intelligence Extraction (PIE), to annotate phenotypes from clinical notes with the detection of contextual synonyms and the enhancement to reason with numerical values. The second contribution is to demonstrate the utility and benefits of using phenotypic features of patients in clinical use cases by predicting patient outcomes in Intensive Care Units (ICU) and identifying patients at risk of specific diseases with better accuracy and model interpretability. The third contribution is to propose generative models to generate clinical narratives to automate and accelerate the process of report writing and summarisation by clinicians. This thesis first proposes a novel summarisation language model named PEGASUS which surpasses or is on par with the state-of-the-art performance on 12 downstream datasets including biomedical literature from PubMed. PEGASUS is further extended to generate medical scientific documents from input tabular data.Open Acces
Recent results on J/Ο radiative decays from BESIII
The BESIII detector has accumulated 1.31Γ109 J/Ο data samples. Based on this sample, the light hadron spectroscopy was extensively studied and many important progresses were achieved in these years. In this proceeding, the recent results on J/Ο radiative decays are reviewed, which include the spin-parity determination of the X(1835) in J/Ο β Ξ³K0 SK0 SΞ·, the observation of the X(1840) in J/Ο β Ξ³3(Ο+Οβ), the partial wave analysis of J/Ο β Ξ³ΟΟ and the model independent partial wave analysis of J/Ο β Ξ³Ο0Ο0
Recent Advances and Perspective of Studies on Phlegm Syndrome in Chinese Medicine
This review paper summarized the current situation of studies on the essence of phlegm syndrome and relation between phlegm syndrome, diseases, and therapeutics based on published English articles. In studies on the essence of phlegm syndrome, omic technologies were used to explore the molecular basis of phlegm syndrome; in studies on relation between phlegm syndrome and diseases, discovery of markers of phlegm syndrome in diseases becomes a hotspot; the distribution of phlegm syndromes in some common chronic diseases was found; in the therapy of phlegm syndrome, two therapeutic models, treatment with CM formula and treatment with a combination of CM formula and Western medicine, were used most frequently. It is certainly that using one omic technology is not able to deal with the complexity of phlegm syndrome and that the use of a combination of multiple omic methods will be a trend in future studies. Meanwhile, for rapidly increasing clinical research quality of phlegm syndrome, a series of agreed criteria, such as syndrome diagnostic criteria and efficacy criteria clinical studies of phlegm syndrome, needed to be established urgently, and there was an urgent need of standardizing syndrome names in English
Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records
The extraction of phenotype information which is naturally contained in
electronic health records (EHRs) has been found to be useful in various
clinical informatics applications such as disease diagnosis. However, due to
imprecise descriptions, lack of gold standards and the demand for efficiency,
annotating phenotypic abnormalities on millions of EHR narratives is still
challenging. In this work, we propose a novel unsupervised deep learning
framework to annotate the phenotypic abnormalities from EHRs via semantic
latent representations. The proposed framework takes the advantage of Human
Phenotype Ontology (HPO), which is a knowledge base of phenotypic
abnormalities, to standardize the annotation results. Experiments have been
conducted on 52,722 EHRs from MIMIC-III dataset. Quantitative and qualitative
analysis have shown the proposed framework achieves state-of-the-art annotation
performance and computational efficiency compared with other methods.Comment: Accepted by BIBM 2019 (Regular
Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification
Different aspects of a clinical sample can be revealed by multiple types of
omics data. Integrated analysis of multi-omics data provides a comprehensive
view of patients, which has the potential to facilitate more accurate clinical
decision making. However, omics data are normally high dimensional with large
number of molecular features and relatively small number of available samples
with clinical labels. The "dimensionality curse" makes it challenging to train
a machine learning model using high dimensional omics data like DNA methylation
and gene expression profiles. Here we propose an end-to-end deep learning model
called OmiVAE to extract low dimensional features and classify samples from
multi-omics data. OmiVAE combines the basic structure of variational
autoencoders with a classification network to achieve task-oriented feature
extraction and multi-class classification. The training procedure of OmiVAE is
comprised of an unsupervised phase without the classifier and a supervised
phase with the classifier. During the unsupervised phase, a hierarchical
cluster structure of samples can be automatically formed without the need for
labels. And in the supervised phase, OmiVAE achieved an average classification
accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and
normal samples, which shows better performance than other existing methods. The
OmiVAE model learned from multi-omics data outperformed that using only one
type of omics data, which indicates that the complementary information from
different omics datatypes provides useful insights for biomedical tasks like
cancer classification.Comment: 7 pages, 4 figure
Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs
Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising
paradigm to address the exploration-exploitation dilemma in reinforcement
learning. It decomposes the source task into subgoal conditional subtasks and
conducts exploration and exploitation in the subgoal space. The effectiveness
of GCHRL heavily relies on subgoal representation functions and subgoal
selection strategy. However, existing works often overlook the temporal
coherence in GCHRL when learning latent subgoal representations and lack an
efficient subgoal selection strategy that balances exploration and
exploitation. This paper proposes HIerarchical reinforcement learning via
dynamically building Latent Landmark graphs (HILL) to overcome these
limitations. HILL learns latent subgoal representations that satisfy temporal
coherence using a contrastive representation learning objective. Based on these
representations, HILL dynamically builds latent landmark graphs and employs a
novelty measure on nodes and a utility measure on edges. Finally, HILL develops
a subgoal selection strategy that balances exploration and exploitation by
jointly considering both measures. Experimental results demonstrate that HILL
outperforms state-of-the-art baselines on continuous control tasks with sparse
rewards in sample efficiency and asymptotic performance. Our code is available
at https://github.com/papercode2022/HILL.Comment: Accepted by the conference of International Joint Conference on
Neural Networks (IJCNN) 202
Chilling StressβThe Key Predisposing Factor for Causing Alternaria alternata Infection and Leading to Cotton (Gossypium hirsutum L.) Leaf Senescence
Leaf senescence plays a vital role in nutrient recycling and overall capacity to assimilate carbon dioxide. Cotton premature leaf senescence, often accompanied with unexpected short-term low temperature, has been occurring with an increasing frequency in many cotton-growing areas and causes serious reduction in yield and quality of cotton. The key factors for causing and promoting cotton premature leaf senescence are still unclear. In this case, the relationship between the pre-chilling stress and Alternaria alternata infection for causing cotton leaf senescence was investigated under precisely controlled laboratory conditions with four to five leaves stage cotton plants. The results showed short-term chilling stress could cause a certain degree of physiological impairment to cotton leaves, which could be recovered to normal levels in 2β4 days when the chilling stresses were removed. When these chilling stress injured leaves were further inoculated with A. alternata, the pronounced appearance and development of leaf spot disease, and eventually the pronounced symptoms of leaf senescence, occurred on these cotton leaves. The onset of cotton leaf senescence at this condition was also reflected in various physiological indexes such as irreversible increase in malondialdehyde (MDA) content and electrolyte leakage, irreversible decrease in soluble protein content and chlorophyll content, and irreversible damage in leaves' photosynthesis ability. The presented results demonstrated that chilling stress acted as the key predisposing factor for causing A. alternata infection and leading to cotton leaf senescence. It could be expected that the understanding of the key factors causing and promoting cotton leaf senescence would be helpful for taking appropriate management steps to prevent cotton premature leaf senescence
- β¦