7 research outputs found
Fluent Translations from Disfluent Speech in End-to-End Speech Translation
Spoken language translation applications for speech suffer due to
conversational speech phenomena, particularly the presence of disfluencies.
With the rise of end-to-end speech translation models, processing steps such as
disfluency removal that were previously an intermediate step between speech
recognition and machine translation need to be incorporated into model
architectures. We use a sequence-to-sequence model to translate from noisy,
disfluent speech to fluent text with disfluencies removed using the recently
collected `copy-edited' references for the Fisher Spanish-English dataset. We
are able to directly generate fluent translations and introduce considerations
about how to evaluate success on this task. This work provides a baseline for a
new task, the translation of conversational speech with joint removal of
disfluencies.Comment: Accepted at NAACL 201
META-LEARNING NEURAL MACHINE TRANSLATION CURRICULA
Curriculum learning hypothesizes that presenting training samples in a meaningful order to machine learners during training helps improve model quality and conver- gence rate. In this dissertation, we explore this framework for learning in the context of Neural Machine Translation (NMT). NMT systems are typically trained on a large amount of heterogeneous data and have the potential to benefit greatly from curricu- lum learning in terms of both speed and quality. We concern ourselves with three primary questions in our investigation : (i) how do we design a task and/or dataset specific curriculum for NMT training? (ii) can we leverage human intuition about learning in this design or can we learn the curriculum itself? (iii) how do we featurize training samples (e.g., easy versus hard) so that they can be effectively slotted into a curriculum?
We begin by empirically exploring various hand-designed curricula and their effect on translation performance and speed of training NMT systems. We show that these curricula, most of which are based on human intuition, can improve NMT training speed but are highly sensitive to hyperparameter settings. Next, instead of using a hand-designed curriculum, we meta-learn a curriculum for the task of learning from noisy translation samples using reinforcement learning. We demonstrate that this learned curriculum significantly outperforms a random-curriculum baseline and matches the strongest hand-designed curriculum. We then extend this approach to the task of multi-lingual NMT with an emphasis on accumulating knowledge and learning from multiple training runs. Again, we show that this technique can match the strongest baseline obtained via expensive fine-grained grid search for the (learned) hyperparameters. We conclude with an extension which requires no prior knowledge of sample relevance to the task and uses sample features instead, hence learning both the relevance of each training sample to the task and the appropriate curriculum jointly. We show that this technique outperforms the state-of-the-art results on a noisy filtering task