37,543 research outputs found
Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation
Most recent approaches use the sequence-to-sequence model for paraphrase
generation. The existing sequence-to-sequence model tends to memorize the words
and the patterns in the training dataset instead of learning the meaning of the
words. Therefore, the generated sentences are often grammatically correct but
semantically improper. In this work, we introduce a novel model based on the
encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our
proposed model generates the words by querying distributed word representations
(i.e. neural word embeddings), hoping to capturing the meaning of the according
words. Following previous work, we evaluate our model on two
paraphrase-oriented tasks, namely text simplification and short text
abstractive summarization. Experimental results show that our model outperforms
the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two
English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a
Chinese summarization dataset. Moreover, our model achieves state-of-the-art
performances on these three benchmark datasets.Comment: arXiv admin note: text overlap with arXiv:1710.0231
Label Enhanced Event Detection with Heterogeneous Graph Attention Networks
Event Detection (ED) aims to recognize instances of specified types of event
triggers in text. Different from English ED, Chinese ED suffers from the
problem of word-trigger mismatch due to the uncertain word boundaries. Existing
approaches injecting word information into character-level models have achieved
promising progress to alleviate this problem, but they are limited by two
issues. First, the interaction between characters and lexicon words is not
fully exploited. Second, they ignore the semantic information provided by event
labels. We thus propose a novel architecture named Label enhanced Heterogeneous
Graph Attention Networks (L-HGAT). Specifically, we transform each sentence
into a graph, where character nodes and word nodes are connected with different
types of edges, so that the interaction between words and characters is fully
reserved. A heterogeneous graph attention networks is then introduced to
propagate relational message and enrich information interaction. Furthermore,
we convert each label into a trigger-prototype-based embedding, and design a
margin loss to guide the model distinguish confusing event labels. Experiments
on two benchmark datasets show that our model achieves significant improvement
over a range of competitive baseline methods
A Deep Learning Entity Extraction Model for Chinese Government Documents
In this paper, we propose a combined Whole-Word-Masking based Robustly Optimized BERT pretraining approach with dictionary embedding entities recognition model for Chinese documents. By using multiple feature vectors generated by such as Roberta and domain dictionaries as embedding layers, the contextual semantic information of the text is fully considered. Meanwhile, Bi-directional Long Short-Term Memory(BiLSTM) and a multi-head attention mechanism are used to learn the information of long-distance dependency of the text. We use conditional random field(CRF) to obtain the global optimal annotation sequence, which is expected to improve the performance of the model. In this paper, we conduct comparison experiments with five baseline-based methods in the official document dataset of government affairs domain. The Precision of the model is 91.8%, Recall is 90.5%, and F1 value is 91.1%, which are better than other baseline models, indicating that the proposed model is more accurate for recognizing named entities in government documents
Multi-channel Encoder for Neural Machine Translation
Attention-based Encoder-Decoder has the effective architecture for neural
machine translation (NMT), which typically relies on recurrent neural networks
(RNN) to build the blocks that will be lately called by attentive reader during
the decoding process. This design of encoder yields relatively uniform
composition on source sentence, despite the gating mechanism employed in
encoding RNN. On the other hand, we often hope the decoder to take pieces of
source sentence at varying levels suiting its own linguistic structure: for
example, we may want to take the entity name in its raw form while taking an
idiom as a perfectly composed unit. Motivated by this demand, we propose
Multi-channel Encoder (MCE), which enhances encoding components with different
levels of composition. More specifically, in addition to the hidden state of
encoding RNN, MCE takes 1) the original word embedding for raw encoding with no
composition, and 2) a particular design of external memory in Neural Turing
Machine (NTM) for more complex composition, while all three encoding strategies
are properly blended during decoding. Empirical study on Chinese-English
translation shows that our model can improve by 6.52 BLEU points upon a strong
open source NMT system: DL4MT1. On the WMT14 English- French task, our single
shallow system achieves BLEU=38.8, comparable with the state-of-the-art deep
models.Comment: Accepted by AAAI-201
- …