9 research outputs found
A Reinforced Generation of Adversarial Examples for Neural Machine Translation
Neural machine translation systems tend to fail on less decent inputs despite
its significant efficacy, which may significantly harm the credibility of this
systems-fathoming how and when neural-based systems fail in such cases is
critical for industrial maintenance. Instead of collecting and analyzing bad
cases using limited handcrafted error features, here we investigate this issue
by generating adversarial examples via a new paradigm based on reinforcement
learning. Our paradigm could expose pitfalls for a given performance metric,
e.g., BLEU, and could target any given neural machine translation architecture.
We conduct experiments of adversarial attacks on two mainstream neural machine
translation architectures, RNN-search, and Transformer. The results show that
our method efficiently produces stable attacks with meaning-preserving
adversarial examples. We also present a qualitative and quantitative analysis
for the preference pattern of the attack, demonstrating its capability of
pitfall exposure.Comment: 12 pages, ACL202
Regularized Context Gates on Transformer for Machine Translation
Context gates are effective to control the contributions from the source and
target contexts in the recurrent neural network (RNN) based neural machine
translation (NMT). However, it is challenging to extend them into the advanced
Transformer architecture, which is more complicated than RNN. This paper first
provides a method to identify source and target contexts and then introduce a
gate mechanism to control the source and target contributions in Transformer.
In addition, to further reduce the bias problem in the gate mechanism, this
paper proposes a regularization method to guide the learning of the gates with
supervision automatically generated using pointwise mutual information.
Extensive experiments on 4 translation datasets demonstrate that the proposed
model obtains an averaged gain of 1.0 BLEU score over a strong Transformer
baseline.Comment: Published in ACL 202
Detecting and Understanding Generalization Barriers for Neural Machine Translation
Generalization to unseen instances is our eternal pursuit for all data-driven
models. However, for realistic task like machine translation, the traditional
approach measuring generalization in an average sense provides poor
understanding for the fine-grained generalization ability. As a remedy, this
paper attempts to identify and understand generalization barrier words within
an unseen input sentence that \textit{cause} the degradation of fine-grained
generalization. We propose a principled definition of generalization barrier
words and a modified version which is tractable in computation. Based on the
modified one, we propose three simple methods for barrier detection by the
search-aware risk estimation through counterfactual generation. We then conduct
extensive analyses on those detected generalization barrier words on both
ZhEn NIST benchmarks from various perspectives. Potential
usage of the detected barrier words is also discussed.Comment: Preprin
Domain Adaptation of Neural Machine Translation by Lexicon Induction
It has been previously noted that neural machine translation (NMT) is very
sensitive to domain shift. In this paper, we argue that this is a dual effect
of the highly lexicalized nature of NMT, resulting in failure for sentences
with large numbers of unknown words, and lack of supervision for
domain-specific words. To remedy this problem, we propose an unsupervised
adaptation method which fine-tunes a pre-trained out-of-domain NMT model using
a pseudo-in-domain corpus. Specifically, we perform lexicon induction to
extract an in-domain lexicon, and construct a pseudo-parallel in-domain corpus
by performing word-for-word back-translation of monolingual in-domain target
sentences. In five domains over twenty pairwise adaptation settings and two
model architectures, our method achieves consistent improvements without using
any in-domain parallel sentences, improving up to 14 BLEU over unadapted
models, and up to 2 BLEU over strong back-translation baselines
Neural Combinatory Constituency Parsing
We propose two fast neural combinatory models for constituency parsing:
binary and multi-branching. Our models decompose the bottom-up parsing process
into 1) classification of tags, labels, and binary orientations or chunks and
2) vector composition based on the computed orientations or chunks. These
models have theoretical sub-quadratic complexity and empirical linear
complexity. The binary model achieves an F1 score of 92.54 on Penn Treebank,
speeding at 1327.2 sents/sec. Both the models with XLNet provide near
state-of-the-art accuracies for English. Syntactic branching tendency and
headedness of a language are observed during the training and inference
processes for Penn Treebank, Chinese Treebank, and Keyaki Treebank (Japanese).Comment: Findings of ACL 2021; 15 page
Improving Pre-Trained Multilingual Models with Vocabulary Expansion
Recently, pre-trained language models have achieved remarkable success in a
broad range of natural language processing tasks. However, in multilingual
setting, it is extremely resource-consuming to pre-train a deep language model
over large-scale corpora for each language. Instead of exhaustively
pre-training monolingual language models independently, an alternative solution
is to pre-train a powerful multilingual deep language model over large-scale
corpora in hundreds of languages. However, the vocabulary size for each
language in such a model is relatively small, especially for low-resource
languages. This limitation inevitably hinders the performance of these
multilingual models on tasks such as sequence labeling, wherein in-depth
token-level or sentence-level understanding is essential.
In this paper, inspired by previous methods designed for monolingual
settings, we investigate two approaches (i.e., joint mapping and mixture
mapping) based on a pre-trained multilingual model BERT for addressing the
out-of-vocabulary (OOV) problem on a variety of tasks, including part-of-speech
tagging, named entity recognition, machine translation quality estimation, and
machine reading comprehension. Experimental results show that using mixture
mapping is more promising. To the best of our knowledge, this is the first work
that attempts to address and discuss the OOV issue in multilingual settings.Comment: CONLL 2019 final versio
Knowledge Efficient Deep Learning for Natural Language Processing
Deep learning has become the workhorse for a wide range of natural language
processing applications. But much of the success of deep learning relies on
annotated examples. Annotation is time-consuming and expensive to produce at
scale. Here we are interested in methods for reducing the required quantity of
annotated data -- by making the learning methods more knowledge efficient so as
to make them more applicable in low annotation (low resource) settings. There
are various classical approaches to making the models more knowledge efficient
such as multi-task learning, transfer learning, weakly supervised and
unsupervised learning etc. This thesis focuses on adapting such classical
methods to modern deep learning models and algorithms.
This thesis describes four works aimed at making machine learning models more
knowledge efficient. First, we propose a knowledge rich deep learning model
(KRDL) as a unifying learning framework for incorporating prior knowledge into
deep models. In particular, we apply KRDL built on Markov logic networks to
denoise weak supervision. Second, we apply a KRDL model to assist the machine
reading models to find the correct evidence sentences that can support their
decision. Third, we investigate the knowledge transfer techniques in
multilingual setting, where we proposed a method that can improve pre-trained
multilingual BERT based on the bilingual dictionary. Fourth, we present an
episodic memory network for language modelling, in which we encode the large
external knowledge for the pre-trained GPT.Comment: Ph.D thesi
A Survey of Deep Learning Techniques for Neural Machine Translation
In recent years, natural language processing (NLP) has got great development
with deep learning techniques. In the sub-field of machine translation, a new
approach named Neural Machine Translation (NMT) has emerged and got massive
attention from both academia and industry. However, with a significant number
of researches proposed in the past several years, there is little work in
investigating the development process of this new technology trend. This
literature survey traces back the origin and principal development timeline of
NMT, investigates the important branches, categorizes different research
orientations, and discusses some future research trends in this field
Neural Machine Translation: Challenges, Progress and Future
Machine translation (MT) is a technique that leverages computers to translate
human languages automatically. Nowadays, neural machine translation (NMT) which
models direct mapping between source and target languages with deep neural
networks has achieved a big breakthrough in translation performance and become
the de facto paradigm of MT. This article makes a review of NMT framework,
discusses the challenges in NMT, introduces some exciting recent progresses and
finally looks forward to some potential future research trends. In addition, we
maintain the state-of-the-art methods for various NMT tasks at the website
https://github.com/ZNLP/SOTA-MT.Comment: Invited Review of Science China Technological Science