23,877 research outputs found
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
Named entity recognition (NER) in Chinese is essential but difficult because
of the lack of natural delimiters. Therefore, Chinese Word Segmentation (CWS)
is usually considered as the first step for Chinese NER. However, models based
on word-level embeddings and lexicon features often suffer from segmentation
errors and out-of-vocabulary (OOV) words. In this paper, we investigate a
Convolutional Attention Network called CAN for Chinese NER, which consists of a
character-based convolutional neural network (CNN) with local-attention layer
and a gated recurrent unit (GRU) with global self-attention layer to capture
the information from adjacent characters and sentence contexts. Also, compared
to other models, not depending on any external resources like lexicons and
employing small size of char embeddings make our model more practical.
Extensive experimental results show that our approach outperforms
state-of-the-art methods without word embedding and external lexicon resources
on different domain datasets including Weibo, MSRA and Chinese Resume NER
dataset.Comment: This paper is accepted by NAACL-HLT 2019. The code is available at
https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NE
Entity Candidate Network for Whole-Aware Named Entity Recognition
Named Entity Recognition (NER) is a crucial upstream task in Natural Language
Processing (NLP). Traditional tag scheme approaches offer a single recognition
that does not meet the needs of many downstream tasks such as coreference
resolution. Meanwhile, Tag scheme approaches ignore the continuity of entities.
Inspired by one-stage object detection models in computer vision (CV), this
paper proposes a new no-tag scheme, the Whole-Aware Detection, which makes NER
an object detection task. Meanwhile, this paper presents a novel model, Entity
Candidate Network (ECNet), and a specific convolution network, Adaptive Context
Convolution Network (ACCN), to fuse multi-scale contexts and encode entity
information at each position. ECNet identifies the full span of a named entity
and its type at each position based on Entity Loss. Furthermore, ECNet is
regulable between the highest precision and the highest recall, while the tag
scheme approaches are not. Experimental results on the CoNLL 2003 English
dataset and the WNUT 2017 dataset show that ECNet outperforms other previous
state-of-the-art methods.Comment: 10 pages, 4 figure
Deep Neural Networks Ensemble for Detecting Medication Mentions in Tweets
Objective: After years of research, Twitter posts are now recognized as an
important source of patient-generated data, providing unique insights into
population health. A fundamental step to incorporating Twitter data in
pharmacoepidemiological research is to automatically recognize medication
mentions in tweets. Given that lexical searches for medication names may fail
due to misspellings or ambiguity with common words, we propose a more advanced
method to recognize them. Methods: We present Kusuri, an Ensemble Learning
classifier, able to identify tweets mentioning drug products and dietary
supplements. Kusuri ("medication" in Japanese) is composed of two modules.
First, four different classifiers (lexicon-based, spelling-variant-based,
pattern-based and one based on a weakly-trained neural network) are applied in
parallel to discover tweets potentially containing medication names. Second, an
ensemble of deep neural networks encoding morphological, semantical and
long-range dependencies of important words in the tweets discovered is used to
make the final decision. Results: On a balanced (50-50) corpus of 15,005
tweets, Kusuri demonstrated performances close to human annotators with 93.7%
F1-score, the best score achieved thus far on this corpus. On a corpus made of
all tweets posted by 113 Twitter users (98,959 tweets, with only 0.26%
mentioning medications), Kusuri obtained 76.3% F1-score. There is not a prior
drug extraction system that compares running on such an extremely unbalanced
dataset. Conclusion: The system identifies tweets mentioning drug names with
performance high enough to ensure its usefulness and ready to be integrated in
larger natural language processing systems.Comment: This is a pre-copy-editing, author-produced PDF of an article
accepted for publication in JAMIA following peer review. The definitive
publisher-authenticated version is "D. Weissenbacher, A. Sarker, A. Klein, K.
O'Connor, A. Magge, G. Gonzalez-Hernandez, Deep neural networks ensemble for
detecting medication mentions in tweets, Journal of the American Medical
Informatics Association, ocz156, 2019
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension
Machine comprehension(MC) style question answering is a representative
problem in natural language processing. Previous methods rarely spend time on
the improvement of encoding layer, especially the embedding of syntactic
information and name entity of the words, which are very crucial to the quality
of encoding. Moreover, existing attention methods represent each query word as
a vector or use a single vector to represent the whole query sentence, neither
of them can handle the proper weight of the key words in query sentence. In
this paper, we introduce a novel neural network architecture called Multi-layer
Embedding with Memory Network(MEMEN) for machine reading task. In the encoding
layer, we employ classic skip-gram model to the syntactic and semantic
information of the words to train a new kind of embedding layer. We also
propose a memory network of full-orientation matching of the query and passage
to catch more pivotal information. Experiments show that our model has
competitive results both from the perspectives of precision and efficiency in
Stanford Question Answering Dataset(SQuAD) among all published results and
achieves the state-of-the-art results on TriviaQA dataset
Efficient Sequence Labeling with Actor-Critic Training
Neural approaches to sequence labeling often use a Conditional Random Field
(CRF) to model their output dependencies, while Recurrent Neural Networks (RNN)
are used for the same purpose in other tasks. We set out to establish RNNs as
an attractive alternative to CRFs for sequence labeling. To do so, we address
one of the RNN's most prominent shortcomings, the fact that it is not exposed
to its own errors with the maximum-likelihood training. We frame the prediction
of the output sequence as a sequential decision-making process, where we train
the network with an adjusted actor-critic algorithm (AC-RNN). We
comprehensively compare this strategy with maximum-likelihood training for both
RNNs and CRFs on three structured-output tasks. The proposed AC-RNN efficiently
matches the performance of the CRF on NER and CCG tagging, and outperforms it
on Machine Transliteration. We also show that our training strategy is
significantly better than other techniques for addressing RNN's exposure bias,
such as Scheduled Sampling, and Self-Critical policy training
An Attentive Sequence Model for Adverse Drug Event Extraction from Biomedical Text
Adverse reaction caused by drugs is a potentially dangerous problem which may
lead to mortality and morbidity in patients. Adverse Drug Event (ADE)
extraction is a significant problem in biomedical research. We model ADE
extraction as a Question-Answering problem and take inspiration from Machine
Reading Comprehension (MRC) literature, to design our model. Our objective in
designing such a model, is to exploit the local linguistic context in clinical
text and enable intra-sequence interaction, in order to jointly learn to
classify drug and disease entities, and to extract adverse reactions caused by
a given drug. Our model makes use of a self-attention mechanism to facilitate
intra-sequence interaction in a text sequence. This enables us to visualize and
understand how the network makes use of the local and wider context for
classification.Comment: 7 pages, 5 figures, 4 table
Causality Extraction based on Self-Attentive BiLSTM-CRF with Transferred Embeddings
Causality extraction from natural language texts is a challenging open
problem in artificial intelligence. Existing methods utilize patterns,
constraints, and machine learning techniques to extract causality, heavily
depending on domain knowledge and requiring considerable human effort and time
for feature engineering. In this paper, we formulate causality extraction as a
sequence labeling problem based on a novel causality tagging scheme. On this
basis, we propose a neural causality extractor with the BiLSTM-CRF model as the
backbone, named SCITE (Self-attentive BiLSTM-CRF wIth Transferred Embeddings),
which can directly extract cause and effect without extracting candidate causal
pairs and identifying their relations separately. To address the problem of
data insufficiency, we transfer contextual string embeddings, also known as
Flair embeddings, which are trained on a large corpus in our task. In addition,
to improve the performance of causality extraction, we introduce a multihead
self-attention mechanism into SCITE to learn the dependencies between causal
words. We evaluate our method on a public dataset, and experimental results
demonstrate that our method achieves significant and consistent improvement
compared to baselines.Comment: 39 pages, 11 figures, 6 table
Good News, Everyone! Context driven entity-aware captioning for news images
Current image captioning systems perform at a merely descriptive level,
essentially enumerating the objects in the scene and their relations. Humans,
on the contrary, interpret images by integrating several sources of prior
knowledge of the world. In this work, we aim to take a step closer to producing
captions that offer a plausible interpretation of the scene, by integrating
such contextual information into the captioning pipeline. For this we focus on
the captioning of images used to illustrate news articles. We propose a novel
captioning method that is able to leverage contextual information provided by
the text of news articles associated with an image. Our model is able to
selectively draw information from the article guided by visual cues, and to
dynamically extend the output dictionary to out-of-vocabulary named entities
that appear in the context source. Furthermore we introduce `GoodNews', the
largest news image captioning dataset in the literature and demonstrate
state-of-the-art results.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR
2019
Named Entity Disambiguation for Noisy Text
We address the task of Named Entity Disambiguation (NED) for noisy text. We
present WikilinksNED, a large-scale NED dataset of text fragments from the web,
which is significantly noisier and more challenging than existing news-based
datasets. To capture the limited and noisy local context surrounding each
mention, we design a neural model and train it with a novel method for sampling
informative negative examples. We also describe a new way of initializing word
and entity embeddings that significantly improves performance. Our model
significantly outperforms existing state-of-the-art methods on WikilinksNED
while achieving comparable performance on a smaller newswire dataset.Comment: Accepted to CoNLL 201
Augmenting Neural Machine Translation with Knowledge Graphs
While neural networks have been used extensively to make substantial progress
in the machine translation task, they are known for being heavily dependent on
the availability of large amounts of training data. Recent efforts have tried
to alleviate the data sparsity problem by augmenting the training data using
different strategies, such as back-translation. Along with the data scarcity,
the out-of-vocabulary words, mostly entities and terminological expressions,
pose a difficult challenge to Neural Machine Translation systems. In this
paper, we hypothesize that knowledge graphs enhance the semantic feature
extraction of neural models, thus optimizing the translation of entities and
terminological expressions in texts and consequently leading to a better
translation quality. We hence investigate two different strategies for
incorporating knowledge graphs into neural models without modifying the neural
network architectures. We also examine the effectiveness of our augmentation
method to recurrent and non-recurrent (self-attentional) neural architectures.
Our knowledge graph augmented neural translation model, dubbed KG-NMT, achieves
significant and consistent improvements of +3 BLEU, METEOR and chrF3 on average
on the newstest datasets between 2014 and 2018 for WMT English-German
translation task
- …