187 research outputs found
Learning to Remember Translation History with a Continuous Cache
Existing neural machine translation (NMT) models generally translate
sentences in isolation, missing the opportunity to take advantage of
document-level information. In this work, we propose to augment NMT models with
a very light-weight cache-like memory network, which stores recent hidden
representations as translation history. The probability distribution over
generated words is updated online depending on the translation history
retrieved from the memory, endowing NMT models with the capability to
dynamically adapt over time. Experiments on multiple domains with different
topics and styles show the effectiveness of the proposed approach with
negligible impact on the computational cost.Comment: Accepted by TACL 201
Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction
We investigate the problem of Chinese Grammatical Error Correction (CGEC) and
present a new framework named Tail-to-Tail (\textbf{TtT}) non-autoregressive
sequence prediction to address the deep issues hidden in CGEC. Considering that
most tokens are correct and can be conveyed directly from source to target, and
the error positions can be estimated and corrected based on the bidirectional
context information, thus we employ a BERT-initialized Transformer Encoder as
the backbone model to conduct information modeling and conveying. Considering
that only relying on the same position substitution cannot handle the
variable-length correction cases, various operations such substitution,
deletion, insertion, and local paraphrasing are required jointly. Therefore, a
Conditional Random Fields (CRF) layer is stacked on the up tail to conduct
non-autoregressive sequence prediction by modeling the token dependencies.
Since most tokens are correct and easily to be predicted/conveyed to the
target, then the models may suffer from a severe class imbalance issue. To
alleviate this problem, focal loss penalty strategies are integrated into the
loss functions. Moreover, besides the typical fix-length error correction
datasets, we also construct a variable-length corpus to conduct experiments.
Experimental results on standard datasets, especially on the variable-length
datasets, demonstrate the effectiveness of TtT in terms of sentence-level
Accuracy, Precision, Recall, and F1-Measure on tasks of error Detection and
Correction.Comment: ACL 2021. Code: https://github.com/lipiji/TtT. Fix the results of
SpellGCN on Oct.26,202
hyperdoc2vec: Distributed Representations of Hypertext Documents
Hypertext documents, such as web pages and academic papers, are of great
importance in delivering information in our daily life. Although being
effective on plain documents, conventional text embedding methods suffer from
information loss if directly adapted to hyper-documents. In this paper, we
propose a general embedding approach for hyper-documents, namely, hyperdoc2vec,
along with four criteria characterizing necessary information that
hyper-document embedding models should preserve. Systematic comparisons are
conducted between hyperdoc2vec and several competitors on two tasks, i.e.,
paper classification and citation recommendation, in the academic paper domain.
Analyses and experiments both validate the superiority of hyperdoc2vec to other
models w.r.t. the four criteria.Comment: Accepted to ACL 201
Microblog Hashtag Generation via Encoding Conversation Contexts
Automatic hashtag annotation plays an important role in content understanding
for microblog posts. To date, progress made in this field has been restricted
to phrase selection from limited candidates, or word-level hashtag discovery
using topic models. Different from previous work considering hashtags to be
inseparable, our work is the first effort to annotate hashtags with a novel
sequence generation framework via viewing the hashtag as a short sequence of
words. Moreover, to address the data sparsity issue in processing short
microblog posts, we propose to jointly model the target posts and the
conversation contexts initiated by them with bidirectional attention. Extensive
experimental results on two large-scale datasets, newly collected from English
Twitter and Chinese Weibo, show that our model significantly outperforms
state-of-the-art models based on classification. Further studies demonstrate
our ability to effectively generate rare and even unseen hashtags, which is
however not possible for most existing methods.Comment: NAACL 2019 (10 pages
Evaluating Explanation Methods for Neural Machine Translation
Recently many efforts have been devoted to interpreting the black-box NMT
models, but little progress has been made on metrics to evaluate explanation
methods. Word Alignment Error Rate can be used as such a metric that matches
human understanding, however, it can not measure explanation methods on those
target words that are not aligned to any source word. This paper thereby makes
an initial attempt to evaluate explanation methods from an alternative
viewpoint. To this end, it proposes a principled metric based on fidelity in
regard to the predictive behavior of the NMT model. As the exact computation
for this metric is intractable, we employ an efficient approach as its
approximation. On six standard translation tasks, we quantitatively evaluate
several explanation methods in terms of the proposed metric and we reveal some
valuable findings for these explanation methods in our experiments.Comment: Accepted to ACL 2020, 9 page
Segmenting Natural Language Sentences via Lexical Unit Analysis
In this work, we present Lexical Unit Analysis (LUA), a framework for general
sequence segmentation tasks. Given a natural language sentence, LUA scores all
the valid segmentation candidates and utilizes dynamic programming (DP) to
extract the maximum scoring one. LUA enjoys a number of appealing properties
such as inherently guaranteeing the predicted segmentation to be valid and
facilitating globally optimal training and inference. Besides, the practical
time complexity of LUA can be reduced to linear time, which is very efficient.
We have conducted extensive experiments on 5 tasks, including syntactic
chunking, named entity recognition (NER), slot filling, Chinese word
segmentation, and Chinese part-of-speech (POS) tagging, across 15 datasets. Our
models have achieved the state-of-the-art performances on 13 of them. The
results also show that the F1 score of identifying long-length segments is
notably improved
Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory
For dialogue response generation, traditional generative models generate
responses solely from input queries. Such models rely on insufficient
information for generating a specific response since a certain query could be
answered in multiple ways. Consequentially, those models tend to output generic
and dull responses, impeding the generation of informative utterances.
Recently, researchers have attempted to fill the information gap by exploiting
information retrieval techniques. When generating a response for a current
query, similar dialogues retrieved from the entire training data are considered
as an additional knowledge source. While this may harvest massive information,
the generative models could be overwhelmed, leading to undesirable performance.
In this paper, we propose a new framework which exploits retrieval results via
a skeleton-then-response paradigm. At first, a skeleton is generated by
revising the retrieved responses. Then, a novel generative model uses both the
generated skeleton and the original query for response generation. Experimental
results show that our approaches significantly improve the diversity and
informativeness of the generated responses.Comment: accepted to NAACL201
Fine-Grained Sentence Functions for Short-Text Conversation
Sentence function is an important linguistic feature referring to a user's
purpose in uttering a specific sentence. The use of sentence function has shown
promising results to improve the performance of conversation models. However,
there is no large conversation dataset annotated with sentence functions. In
this work, we collect a new Short-Text Conversation dataset with manually
annotated SEntence FUNctions (STC-Sefun). Classification models are trained on
this dataset to (i) recognize the sentence function of new data in a large
corpus of short-text conversations; (ii) estimate a proper sentence function of
the response given a test query. We later train conversation models conditioned
on the sentence functions, including information retrieval-based and neural
generative models. Experimental results demonstrate that the use of sentence
functions can help improve the quality of the returned responses.Comment: Here is a revised version of our paper accepted by ACL201
Exploiting Sentential Context for Neural Machine Translation
In this work, we present novel approaches to exploit sentential context for
neural machine translation (NMT). Specifically, we first show that a shallow
sentential context extracted from the top encoder layer only, can improve
translation performance via contextualizing the encoding representations of
individual words. Next, we introduce a deep sentential context, which
aggregates the sentential context representations from all the internal layers
of the encoder to form a more comprehensive context representation.
Experimental results on the WMT14 English-to-German and English-to-French
benchmarks show that our model consistently improves performance over the
strong TRANSFORMER model (Vaswani et al., 2017), demonstrating the necessity
and effectiveness of exploiting sentential context for NMT.Comment: Accepted by ACL 201
Rigid Formats Controlled Text Generation
Neural text generation has made tremendous progress in various tasks. One
common characteristic of most of the tasks is that the texts are not restricted
to some rigid formats when generating. However, we may confront some special
text paradigms such as Lyrics (assume the music score is given), Sonnet, SongCi
(classical Chinese poetry of the Song dynasty), etc. The typical
characteristics of these texts are in three folds: (1) They must comply fully
with the rigid predefined formats. (2) They must obey some rhyming schemes. (3)
Although they are restricted to some formats, the sentence integrity must be
guaranteed. To the best of our knowledge, text generation based on the
predefined rigid formats has not been well investigated. Therefore, we propose
a simple and elegant framework named SongNet to tackle this problem. The
backbone of the framework is a Transformer-based auto-regressive language
model. Sets of symbols are tailor-designed to improve the modeling performance
especially on format, rhyme, and sentence integrity. We improve the attention
mechanism to impel the model to capture some future information on the format.
A pre-training and fine-tuning framework is designed to further improve the
generation quality. Extensive experiments conducted on two collected corpora
demonstrate that our proposed framework generates significantly better results
in terms of both automatic metrics and the human evaluation.Comment: ACL2020, 10 page
- …