Search CORE

294 research outputs found

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification

Author: Guo Yike
Lertvittayakumjorn Piyawat
Zhang Jingqing
Publication venue
Publication date: 01/01/2019
Field of study

Insufficient or even unavailable training data of emerging classes is a big challenge of many classification tasks, including text classification. Recognising text documents of classes that have never been seen in the learning stage, so-called zero-shot text classification, is therefore difficult and only limited previous works tackled this problem. In this paper, we propose a two-phase framework together with data augmentation and feature augmentation to solve this problem. Four kinds of semantic knowledge (word embeddings, class descriptions, class hierarchy, and a general knowledge graph) are incorporated into the proposed framework to deal with instances of unseen classes effectively. Experimental results show that each and the combination of the two phases achieve the best overall accuracy compared with baselines and recent approaches in classifying real-world texts under the zero-shot scenario.Comment: Accepted NAACL-HLT 201

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Language modelling for clinical natural language understanding and generation

Author: Zhang Jingqing
Publication venue: Computing, Imperial College London
Publication date: 01/12/2022
Field of study

One of the long-standing objectives of Artificial Intelligence (AI) is to design and develop algorithms for social good including tackling public health challenges. In the era of digitisation, with an unprecedented amount of healthcare data being captured in digital form, the analysis of the healthcare data at scale can lead to better research of diseases, better monitoring patient conditions and more importantly improving patient outcomes. However, many AI-based analytic algorithms rely solely on structured healthcare data such as bedside measurements and test results which only account for 20% of all healthcare data, whereas the remaining 80% of healthcare data is unstructured including textual data such as clinical notes and discharge summaries which is still underexplored. Conventional Natural Language Processing (NLP) algorithms that are designed for clinical applications rely on the shallow matching, templates and non-contextualised word embeddings which lead to limited understanding of contextual semantics. Though recent advances in NLP algorithms have demonstrated promising performance on a variety of NLP tasks in the general domain with contextualised language models, most of these generic NLP algorithms struggle at specific clinical NLP tasks which require biomedical knowledge and reasoning. Besides, there is limited research to study generative NLP algorithms to generate clinical reports and summaries automatically by considering salient clinical information. This thesis aims to design and develop novel NLP algorithms especially clinical-driven contextualised language models to understand textual healthcare data and generate clinical narratives which can potentially support clinicians, medical scientists and patients. The first contribution of this thesis focuses on capturing phenotypic information of patients from clinical notes which is important to profile patient situation and improve patient outcomes. The thesis proposes a novel self-supervised language model, named Phenotypic Intelligence Extraction (PIE), to annotate phenotypes from clinical notes with the detection of contextual synonyms and the enhancement to reason with numerical values. The second contribution is to demonstrate the utility and benefits of using phenotypic features of patients in clinical use cases by predicting patient outcomes in Intensive Care Units (ICU) and identifying patients at risk of specific diseases with better accuracy and model interpretability. The third contribution is to propose generative models to generate clinical narratives to automate and accelerate the process of report writing and summarisation by clinicians. This thesis first proposes a novel summarisation language model named PEGASUS which surpasses or is on par with the state-of-the-art performance on 12 downstream datasets including biomedical literature from PubMed. PEGASUS is further extended to generate medical scientific documents from input tabular data.Open Acces

Spiral - Imperial College Digital Repository

Recent results on J/ψ radiative decays from BESIII

Author: Zhang Jingqing
Publication venue: Societa italiana di fisica
Publication date: 01/01/2016
Field of study

The BESIII detector has accumulated 1.31×109 J/ψ data samples. Based on this sample, the light hadron spectroscopy was extensively studied and many important progresses were achieved in these years. In this proceeding, the recent results on J/ψ radiative decays are reviewed, which include the spin-parity determination of the X(1835) in J/ψ → γK0 SK0 Sη, the observation of the X(1840) in J/ψ → γ3(π+π−), the partial wave analysis of J/ψ → γφφ and the model independent partial wave analysis of J/ψ → γπ0π0

Scientific Open-access Literature Archive and Repository

Recent Advances and Perspective of Studies on Phlegm Syndrome in Chinese Medicine

Author: Jingqing Hu
Zhiguo Zhang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

This review paper summarized the current situation of studies on the essence of phlegm syndrome and relation between phlegm syndrome, diseases, and therapeutics based on published English articles. In studies on the essence of phlegm syndrome, omic technologies were used to explore the molecular basis of phlegm syndrome; in studies on relation between phlegm syndrome and diseases, discovery of markers of phlegm syndrome in diseases becomes a hotspot; the distribution of phlegm syndromes in some common chronic diseases was found; in the therapy of phlegm syndrome, two therapeutic models, treatment with CM formula and treatment with a combination of CM formula and Western medicine, were used most frequently. It is certainly that using one omic technology is not able to deal with the complexity of phlegm syndrome and that the use of a combination of multiple omic methods will be a trend in future studies. Meanwhile, for rapidly increasing clinical research quality of phlegm syndrome, a series of agreed criteria, such as syndrome diagnostic criteria and efficacy criteria clinical studies of phlegm syndrome, needed to be established urgently, and there was an urgent need of standardizing syndrome names in English

Crossref

Directory of Open Access Journals

Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

Author: Dai Chengliang
Guo Yike
Sun Kai
Yang Xian
Zhang Jingqing
Zhang Xiaoyu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The "dimensionality curse" makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.Comment: 7 pages, 4 figure

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records

Author: Dai Chengliang
Guo Yike
Sun Kai
Yang Xian
Zhang Jingqing
Zhang Xiaoyu
Publication venue
Publication date: 01/01/2019
Field of study

The extraction of phenotype information which is naturally contained in electronic health records (EHRs) has been found to be useful in various clinical informatics applications such as disease diagnosis. However, due to imprecise descriptions, lack of gold standards and the demand for efficiency, annotating phenotypic abnormalities on millions of EHR narratives is still challenging. In this work, we propose a novel unsupervised deep learning framework to annotate the phenotypic abnormalities from EHRs via semantic latent representations. The proposed framework takes the advantage of Human Phenotype Ontology (HPO), which is a knowledge base of phenotypic abnormalities, to standardize the annotation results. Experiments have been conducted on 52,722 EHRs from MIMIC-III dataset. Quantitative and qualitative analysis have shown the proposed framework achieves state-of-the-art annotation performance and computational efficiency compared with other methods.Comment: Accepted by BIBM 2019 (Regular

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

Author: Ruan Jingqing
Xing Dengpeng
Xiong Xuantang
Xu Bo
Yang Yiming
Zhang Qingyang
Publication venue
Publication date: 22/07/2023
Field of study

Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. It decomposes the source task into subgoal conditional subtasks and conducts exploration and exploitation in the subgoal space. The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy. However, existing works often overlook the temporal coherence in GCHRL when learning latent subgoal representations and lack an efficient subgoal selection strategy that balances exploration and exploitation. This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs (HILL) to overcome these limitations. HILL learns latent subgoal representations that satisfy temporal coherence using a contrastive representation learning objective. Based on these representations, HILL dynamically builds latent landmark graphs and employs a novelty measure on nodes and a utility measure on edges. Finally, HILL develops a subgoal selection strategy that balances exploration and exploitation by jointly considering both measures. Experimental results demonstrate that HILL outperforms state-of-the-art baselines on continuous control tasks with sparse rewards in sample efficiency and asymptotic performance. Our code is available at https://github.com/papercode2022/HILL.Comment: Accepted by the conference of International Joint Conference on Neural Networks (IJCNN) 202

arXiv.org e-Print Archive

Chilling Stress—The Key Predisposing Factor for Causing Alternaria alternata Infection and Leading to Cotton (Gossypium hirsutum L.) Leaf Senescence

Author: Jian Guiliang
Jiang Tengfei
Li Sha
Liu Zhi
Qi Fangjun
Zhang Wenwei
Zhao Jingqing
Publication venue: Public Library of Science
Publication date: 27/04/2012
Field of study

Leaf senescence plays a vital role in nutrient recycling and overall capacity to assimilate carbon dioxide. Cotton premature leaf senescence, often accompanied with unexpected short-term low temperature, has been occurring with an increasing frequency in many cotton-growing areas and causes serious reduction in yield and quality of cotton. The key factors for causing and promoting cotton premature leaf senescence are still unclear. In this case, the relationship between the pre-chilling stress and Alternaria alternata infection for causing cotton leaf senescence was investigated under precisely controlled laboratory conditions with four to five leaves stage cotton plants. The results showed short-term chilling stress could cause a certain degree of physiological impairment to cotton leaves, which could be recovered to normal levels in 2–4 days when the chilling stresses were removed. When these chilling stress injured leaves were further inoculated with A. alternata, the pronounced appearance and development of leaf spot disease, and eventually the pronounced symptoms of leaf senescence, occurred on these cotton leaves. The onset of cotton leaf senescence at this condition was also reflected in various physiological indexes such as irreversible increase in malondialdehyde (MDA) content and electrolyte leakage, irreversible decrease in soluble protein content and chlorophyll content, and irreversible damage in leaves' photosynthesis ability. The presented results demonstrated that chilling stress acted as the key predisposing factor for causing A. alternata infection and leading to cotton leaf senescence. It could be expected that the understanding of the key factors causing and promoting cotton leaf senescence would be helpful for taking appropriate management steps to prevent cotton premature leaf senescence

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central