61 research outputs found
Learning to Extract Coherent Summary via Deep Reinforcement Learning
Coherence plays a critical role in producing a high-quality summary from a
document. In recent years, neural extractive summarization is becoming
increasingly attractive. However, most of them ignore the coherence of
summaries when extracting sentences. As an effort towards extracting coherent
summaries, we propose a neural coherence model to capture the cross-sentence
semantic and syntactic coherence patterns. The proposed neural coherence model
obviates the need for feature engineering and can be trained in an end-to-end
fashion using unlabeled data. Empirical results show that the proposed neural
coherence model can efficiently capture the cross-sentence coherence patterns.
Using the combined output of the neural coherence model and ROUGE package as
the reward, we design a reinforcement learning method to train a proposed
neural extractive summarizer which is named Reinforced Neural Extractive
Summarization (RNES) model. The RNES model learns to optimize coherence and
informative importance of the summary simultaneously. Experimental results show
that the proposed RNES outperforms existing baselines and achieves
state-of-the-art performance in term of ROUGE on CNN/Daily Mail dataset. The
qualitative evaluation indicates that summaries produced by RNES are more
coherent and readable.Comment: 8 pages, 1 figure, presented at AAAI-201
LCSTS: A Large Scale Chinese Short Text Summarization Dataset
Automatic text summarization is widely regarded as the highly difficult
problem, partially because of the lack of large text summarization data set.
Due to the great challenge of constructing the large scale summaries for full
text, in this paper, we introduce a large corpus of Chinese short text
summarization dataset constructed from the Chinese microblogging website Sina
Weibo, which is released to the public
{http://icrc.hitsz.edu.cn/Article/show/139.html}. This corpus consists of over
2 million real Chinese short texts with short summaries given by the author of
each text. We also manually tagged the relevance of 10,666 short summaries with
their corresponding short texts. Based on the corpus, we introduce recurrent
neural network for the summary generation and achieve promising results, which
not only shows the usefulness of the proposed corpus for short text
summarization research, but also provides a baseline for further research on
this topic.Comment: Recently, we received feedbacks from Yuya Taguchi from NAIST in Japan
and Qian Chen from USTC of China, that the results in the EMNLP2015 version
seem to be underrated. So we carefully checked our results and find out that
we made a mistake while using the standard ROUGE. Then we re-evaluate all
methods in the paper and get corrected results listed in Table 2 of this
versio
Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering
In this paper, the answer selection problem in community question answering
(CQA) is regarded as an answer sequence labeling task, and a novel approach is
proposed based on the recurrent architecture for this problem. Our approach
applies convolution neural networks (CNNs) to learning the joint representation
of question-answer pair firstly, and then uses the joint representation as
input of the long short-term memory (LSTM) to learn the answer sequence of a
question for labeling the matching quality of each answer. Experiments
conducted on the SemEval 2015 CQA dataset shows the effectiveness of our
approach.Comment: 6 page
Prompt-based Text Entailment for Low-Resource Named Entity Recognition
Pre-trained Language Models (PLMs) have been applied in NLP tasks and achieve
promising results. Nevertheless, the fine-tuning procedure needs labeled data
of the target domain, making it difficult to learn in low-resource and
non-trivial labeled scenarios. To address these challenges, we propose
Prompt-based Text Entailment (PTE) for low-resource named entity recognition,
which better leverages knowledge in the PLMs. We first reformulate named entity
recognition as the text entailment task. The original sentence with entity
type-specific prompts is fed into PLMs to get entailment scores for each
candidate. The entity type with the top score is then selected as final label.
Then, we inject tagging labels into prompts and treat words as basic units
instead of n-gram spans to reduce time complexity in generating candidates by
n-grams enumeration. Experimental results demonstrate that the proposed method
PTE achieves competitive performance on the CoNLL03 dataset, and better than
fine-tuned counterparts on the MIT Movie and Few-NERD dataset in low-resource
settings.Comment: COLING 202
Calibration Meets Explanation: A Simple and Effective Approach for Model Confidence Estimates
Calibration strengthens the trustworthiness of black-box models by producing
better accurate confidence estimates on given examples. However, little is
known about if model explanations can help confidence calibration. Intuitively,
humans look at important features attributions and decide whether the model is
trustworthy. Similarly, the explanations can tell us when the model may or may
not know. Inspired by this, we propose a method named CME that leverages model
explanations to make the model less confident with non-inductive attributions.
The idea is that when the model is not highly confident, it is difficult to
identify strong indications of any class, and the tokens accordingly do not
have high attribution scores for any class and vice versa. We conduct extensive
experiments on six datasets with two popular pre-trained language models in the
in-domain and out-of-domain settings. The results show that CME improves
calibration performance in all settings. The expected calibration errors are
further reduced when combined with temperature scaling. Our findings highlight
that model explanations can help calibrate posterior estimates.Comment: EMNLP 202
Generating Medical Assessments Using a Neural Network Model: Algorithm Development and Validation
BACKGROUND: Since its inception, artificial intelligence has aimed to use computers to help make clinical diagnoses. Evidence-based medical reasoning is important for patient care. Inferring clinical diagnoses is a crucial step during the patient encounter. Previous works mainly used expert systems or machine learning-based methods to predict the International Classification of Diseases - Clinical Modification codes based on electronic health records. We report an alternative approach: inference of clinical diagnoses from patients\u27 reported symptoms and physicians\u27 clinical observations.
OBJECTIVE: We aimed to report a natural language processing system for generating medical assessments based on patient information described in the electronic health record (EHR) notes.
METHODS: We processed EHR notes into the Subjective, Objective, Assessment, and Plan sections. We trained a neural network model for medical assessment generation (N2MAG). Our N2MAG is an innovative deep neural model that uses the Subjective and Objective sections of an EHR note to automatically generate an expert-like assessment of the patient. N2MAG can be trained in an end-to-end fashion and does not require feature engineering and external knowledge resources.
RESULTS: We evaluated N2MAG and the baseline models both quantitatively and qualitatively. Evaluated by both the Recall-Oriented Understudy for Gisting Evaluation metrics and domain experts, our results show that N2MAG outperformed the existing state-of-the-art baseline models.
CONCLUSIONS: N2MAG could generate a medical assessment from the Subject and Objective section descriptions in EHR notes. Future work will assess its potential for providing clinical decision support
Generative Multimodal Entity Linking
Multimodal Entity Linking (MEL) is the task of mapping mentions with
multimodal contexts to the referent entities from a knowledge base (e.g.,
Wikipedia). Prior MEL methods mainly focus on designing complex multimodal
interaction mechanisms and require fine-tuning all model parameters, which can
be prohibitively costly and difficult to scale in the era of Large Language
Models (LLMs). In this work, we propose GEMEL, a simple yet effective
Generative Multimodal Entity Linking method, which leverages the capabilities
of LLMs from large-scale pre-training to directly generate target entity names.
We keep the vision and language model frozen and only train a linear layer to
enable cross-modality interactions. To adapt LLMs to the MEL task, we take
advantage of the emerging in-context learning (ICL) capability of LLMs by
retrieving multimodal instances as demonstrations. Extensive experiments show
that with only ~0.3% of the model parameters fine-tuned, GEMEL achieves
state-of-the-art results on two well-established MEL datasets (4.1% accuracy
gains on WikiDiverse and 15.4% accuracy gains on WikiMEL). Our approach is
compatible with any off-the-shelf language model, paving the way towards an
efficient and general solution for utilizing LLMs in the MEL task
A Read-and-Select Framework for Zero-shot Entity Linking
Zero-shot entity linking (EL) aims at aligning entity mentions to unseen
entities to challenge the generalization ability. Previous methods largely
focus on the candidate retrieval stage and ignore the essential candidate
ranking stage, which disambiguates among entities and makes the final linking
prediction. In this paper, we propose a read-and-select (ReS) framework by
modeling the main components of entity disambiguation, i.e., mention-entity
matching and cross-entity comparison. First, for each candidate, the reading
module leverages mention context to output mention-aware entity
representations, enabling mention-entity matching. Then, in the selecting
module, we frame the choice of candidates as a sequence labeling problem, and
all candidate representations are fused together to enable cross-entity
comparison. Our method achieves the state-of-the-art performance on the
established zero-shot EL dataset ZESHEL with a 2.55% micro-average accuracy
gain, with no need for laborious multi-phase pre-training used in most of the
previous work, showing the effectiveness of both mention-entity and
cross-entity interaction.Comment: EMNLP 2023 Finding
- …