146 research outputs found
Learning to Stop in Structured Prediction for Neural Machine Translation
Beam search optimization resolves many issues in neural machine translation.
However, this method lacks principled stopping criteria and does not learn how
to stop during training, and the model naturally prefers the longer hypotheses
during the testing time in practice since they use the raw score instead of the
probability-based score. We propose a novel ranking method which enables an
optimal beam search stopping criteria. We further introduce a structured
prediction loss function which penalizes suboptimal finished candidates
produced by beam search during training. Experiments of neural machine
translation on both synthetic data and real languages (German-to-English and
Chinese-to-English) demonstrate our proposed methods lead to better length and
BLEU score.Comment: 5 page
Simultaneous Translation Policies: From Fixed to Adaptive
Adaptive policies are better than fixed policies for simultaneous
translation, since they can flexibly balance the tradeoff between translation
quality and latency based on the current context information. But previous
methods on obtaining adaptive policies either rely on complicated training
process, or underperform simple fixed policies. We design an algorithm to
achieve adaptive policies via a simple heuristic composition of a set of fixed
policies. Experiments on Chinese -> English and German -> English show that our
adaptive policies can outperform fixed ones by up to 4 BLEU points for the same
latency, and more surprisingly, it even surpasses the BLEU score of
full-sentence translation in the greedy mode (and very close to beam mode), but
with much lower latency
Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation
Neural text generation, including neural machine translation, image
captioning, and summarization, has been quite successful recently. However,
during training time, typically only one reference is considered for each
example, even though there are often multiple references available, e.g., 4
references in NIST MT evaluations, and 5 references in image captioning data.
We first investigate several different ways of utilizing multiple human
references during training. But more importantly, we then propose an algorithm
to generate exponentially many pseudo-references by first compressing existing
human references into lattices and then traversing them to generate new
pseudo-references. These approaches lead to substantial improvements over
strong baselines in both machine translation (+1.5 BLEU) and image captioning
(+3.1 BLEU / +11.7 CIDEr).Comment: 10 page
LaneRCNN: Distributed Representations for Graph-Centric Motion Forecasting
Forecasting the future behaviors of dynamic actors is an important task in
many robotics applications such as self-driving. It is extremely challenging as
actors have latent intentions and their trajectories are governed by complex
interactions between the other actors, themselves, and the maps. In this paper,
we propose LaneRCNN, a graph-centric motion forecasting model. Importantly,
relying on a specially designed graph encoder, we learn a local lane graph
representation per actor (LaneRoI) to encode its past motions and the local map
topology. We further develop an interaction module which permits efficient
message passing among local graph representations within a shared global lane
graph. Moreover, we parameterize the output trajectories based on lane graphs,
a more amenable prediction parameterization. Our LaneRCNN captures the
actor-to-actor and the actor-to-map relations in a distributed and map-aware
manner. We demonstrate the effectiveness of our approach on the large-scale
Argoverse Motion Forecasting Benchmark. We achieve the 1st place on the
leaderboard and significantly outperform previous best results
Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report
This paper describes multimodal machine translation systems developed jointly
by Oregon State University and Baidu Research for WMT 2018 Shared Task on
multimodal translation. In this paper, we introduce a simple approach to
incorporate image information by feeding image features to the decoder side. We
also explore different sequence level training methods including scheduled
sampling and reinforcement learning which lead to substantial improvements. Our
systems ensemble several models using different architectures and training
methods and achieve the best performance for three subtasks: En-De and En-Cs in
task 1 and (En+De+Fr)-Cs task 1B.Comment: 5 page
MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
End-to-end Speech-to-text Translation (E2E-ST), which directly translates
source language speech to target language text, is widely useful in practice,
but traditional cascaded approaches (ASR+MT) often suffer from error
propagation in the pipeline. On the other hand, existing end-to-end solutions
heavily depend on the source language transcriptions for pre-training or
multi-task training with Automatic Speech Recognition (ASR). We instead propose
a simple technique to learn a robust speech encoder in a self-supervised
fashion only on the speech side, which can utilize speech data without
transcription. This technique termed Masked Acoustic Modeling (MAM), not only
provides an alternative solution to improving E2E-ST, but also can perform
pre-training on any acoustic signals (including non-speech ones) without
annotation. We conduct our experiments over 8 different translation directions.
In the setting without using any transcriptions, our technique achieves an
average improvement of +1.1 BLEU, and +2.3 BLEU with MAM pre-training.
Pre-training of MAM with arbitrary acoustic signals also has an average
improvement with +1.6 BLEU for those languages. Compared with ASR multi-task
learning solution, which replies on transcription during training, our
pre-trained MAM model, which does not use transcription, achieves similar
accuracy.Comment: 12 page
Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation
Simultaneous translation is widely useful but remains challenging. Previous
work falls into two main categories: (a) fixed-latency policies such as Ma et
al. (2019) and (b) adaptive policies such as Gu et al. (2017). The former are
simple and effective, but have to aggressively predict future content due to
diverging source-target word order; the latter do not anticipate, but suffer
from unstable and inefficient training. To combine the merits of both
approaches, we propose a simple supervised-learning framework to learn an
adaptive policy from oracle READ/WRITE sequences generated from parallel text.
At each step, such an oracle sequence chooses to WRITE the next target word if
the available source sentence context provides enough information to do so,
otherwise READ the next source word. Experiments on GermanEnglish show that
our method, without retraining the underlying NMT model, can learn flexible
policies with better BLEU scores and similar latencies compared to previous
work.Comment: EMNLP 201
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR
Simultaneous speech-to-text translation is widely useful in many scenarios.
The conventional cascaded approach uses a pipeline of streaming ASR followed by
simultaneous MT, but suffers from error propagation and extra latency. To
alleviate these issues, recent efforts attempt to directly translate the source
speech into target text simultaneously, but this is much harder due to the
combination of two separate tasks. We instead propose a new paradigm with the
advantages of both cascaded and end-to-end approaches. The key idea is to use
two separate, but synchronized, decoders on streaming ASR and direct
speech-to-text translation (ST), respectively, and the intermediate results of
ASR guide the decoding policy of (but is not fed as input to) ST. During
training time, we use multitask learning to jointly learn these two tasks with
a shared encoder. En-to-De and En-to-Es experiments on the MuSTC dataset
demonstrate that our proposed technique achieves substantially better
translation quality at similar levels of latency.Comment: accepted by Findings of ACL 202
Simultaneous Translation with Flexible Policy via Restricted Imitation Learning
Simultaneous translation is widely useful but remains one of the most
difficult tasks in NLP. Previous work either uses fixed-latency policies, or
train a complicated two-staged model using reinforcement learning. We propose a
much simpler single model that adds a `delay' token to the target vocabulary,
and design a restricted dynamic oracle to greatly simplify training.
Experiments on ChineseEnglish simultaneous translation show that our work
leads to flexible policies that achieve better BLEU scores and lower latencies
compared to both fixed and RL-learned policies
Opportunistic Decoding with Timely Correction for Simultaneous Translation
Simultaneous translation has many important application scenarios and
attracts much attention from both academia and industry recently. Most existing
frameworks, however, have difficulties in balancing between the translation
quality and latency, i.e., the decoding policy is usually either too aggressive
or too conservative. We propose an opportunistic decoding technique with timely
correction ability, which always (over-)generates a certain mount of extra
words at each step to keep the audience on track with the latest information.
At the same time, it also corrects, in a timely fashion, the mistakes in the
former overgenerated words when observing more source context to ensure high
translation quality. Experiments show our technique achieves substantial
reduction in latency and up to +3.1 increase in BLEU, with revision rate under
8% in Chinese-to-English and English-to-Chinese translation.Comment: accepted by ACL 202
- …