6 research outputs found
Neural System Combination for Machine Translation
Neural machine translation (NMT) becomes a new approach to machine
translation and generates much more fluent results compared to statistical
machine translation (SMT).
However, SMT is usually better than NMT in translation adequacy. It is
therefore a promising direction to combine the advantages of both NMT and SMT.
In this paper, we propose a neural system combination framework leveraging
multi-source NMT, which takes as input the outputs of NMT and SMT systems and
produces the final translation.
Extensive experiments on the Chinese-to-English translation task show that
our model archives significant improvement by 5.3 BLEU points over the best
single system output and 3.4 BLEU points over the state-of-the-art traditional
system combination methods.Comment: Accepted as a short paper by ACL-201
Consecutive Decoding for Speech-to-text Translation
Speech-to-text translation (ST), which directly translates the source
language speech to the target language text, has attracted intensive attention
recently. However, the combination of speech recognition and machine
translation in a single model poses a heavy burden on the direct cross-modal
cross-lingual mapping. To reduce the learning difficulty, we propose
COnSecutive Transcription and Translation (COSTT), an integral approach for
speech-to-text translation. The key idea is to generate source transcript and
target translation text with a single decoder. It benefits the model training
so that additional large parallel text corpus can be fully exploited to enhance
the speech translation training. Our method is verified on three mainstream
datasets, including Augmented LibriSpeech English-French dataset, TED
English-German dataset, and TED English-Chinese dataset. Experiments show that
our proposed COSTT outperforms the previous state-of-the-art methods. The code
is available at https://github.com/dqqcasia/st.Comment: Accepted by AAAI 2021. arXiv admin note: text overlap with
arXiv:2009.0970
"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text Translation
An end-to-end speech-to-text translation (ST) takes audio in a source
language and outputs the text in a target language. Existing methods are
limited by the amount of parallel corpus. Can we build a system to fully
utilize signals in a parallel ST corpus? We are inspired by human understanding
system which is composed of auditory perception and cognitive processing. In
this paper, we propose Listen-Understand-Translate, (LUT), a unified framework
with triple supervision signals to decouple the end-to-end speech-to-text
translation task. LUT is able to guide the acoustic encoder to extract as much
information from the auditory input. In addition, LUT utilizes a pre-trained
BERT model to enforce the upper encoder to produce as much semantic information
as possible, without extra data. We perform experiments on a diverse set of
speech translation benchmarks, including Librispeech English-French, IWSLT
English-German and TED English-Chinese. Our results demonstrate LUT achieves
the state-of-the-art performance, outperforming previous methods. The code is
available at https://github.com/dqqcasia/st.Comment: Accepted by AAAI 202