6 research outputs found
Fluent Translations from Disfluent Speech in End-to-End Speech Translation
Spoken language translation applications for speech suffer due to
conversational speech phenomena, particularly the presence of disfluencies.
With the rise of end-to-end speech translation models, processing steps such as
disfluency removal that were previously an intermediate step between speech
recognition and machine translation need to be incorporated into model
architectures. We use a sequence-to-sequence model to translate from noisy,
disfluent speech to fluent text with disfluencies removed using the recently
collected `copy-edited' references for the Fisher Spanish-English dataset. We
are able to directly generate fluent translations and introduce considerations
about how to evaluate success on this task. This work provides a baseline for a
new task, the translation of conversational speech with joint removal of
disfluencies.Comment: Accepted at NAACL 201
Designing the Business Conversation Corpus
While the progress of machine translation of written text has come far in the
past several years thanks to the increasing availability of parallel corpora
and corpora-based training technologies, automatic translation of spoken text
and dialogues remains challenging even for modern systems. In this paper, we
aim to boost the machine translation quality of conversational texts by
introducing a newly constructed Japanese-English business conversation parallel
corpus. A detailed analysis of the corpus is provided along with challenging
examples for automatic translation. We also experiment with adding the corpus
in a machine translation training scenario and show how the resulting system
benefits from its use
Consecutive Decoding for Speech-to-text Translation
Speech-to-text translation (ST), which directly translates the source
language speech to the target language text, has attracted intensive attention
recently. However, the combination of speech recognition and machine
translation in a single model poses a heavy burden on the direct cross-modal
cross-lingual mapping. To reduce the learning difficulty, we propose
COnSecutive Transcription and Translation (COSTT), an integral approach for
speech-to-text translation. The key idea is to generate source transcript and
target translation text with a single decoder. It benefits the model training
so that additional large parallel text corpus can be fully exploited to enhance
the speech translation training. Our method is verified on three mainstream
datasets, including Augmented LibriSpeech English-French dataset, TED
English-German dataset, and TED English-Chinese dataset. Experiments show that
our proposed COSTT outperforms the previous state-of-the-art methods. The code
is available at https://github.com/dqqcasia/st.Comment: Accepted by AAAI 2021. arXiv admin note: text overlap with
arXiv:2009.0970
"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text Translation
An end-to-end speech-to-text translation (ST) takes audio in a source
language and outputs the text in a target language. Existing methods are
limited by the amount of parallel corpus. Can we build a system to fully
utilize signals in a parallel ST corpus? We are inspired by human understanding
system which is composed of auditory perception and cognitive processing. In
this paper, we propose Listen-Understand-Translate, (LUT), a unified framework
with triple supervision signals to decouple the end-to-end speech-to-text
translation task. LUT is able to guide the acoustic encoder to extract as much
information from the auditory input. In addition, LUT utilizes a pre-trained
BERT model to enforce the upper encoder to produce as much semantic information
as possible, without extra data. We perform experiments on a diverse set of
speech translation benchmarks, including Librispeech English-French, IWSLT
English-German and TED English-Chinese. Our results demonstrate LUT achieves
the state-of-the-art performance, outperforming previous methods. The code is
available at https://github.com/dqqcasia/st.Comment: Accepted by AAAI 202