7 research outputs found
Neural System Combination for Machine Translation
Neural machine translation (NMT) becomes a new approach to machine
translation and generates much more fluent results compared to statistical
machine translation (SMT).
However, SMT is usually better than NMT in translation adequacy. It is
therefore a promising direction to combine the advantages of both NMT and SMT.
In this paper, we propose a neural system combination framework leveraging
multi-source NMT, which takes as input the outputs of NMT and SMT systems and
produces the final translation.
Extensive experiments on the Chinese-to-English translation task show that
our model archives significant improvement by 5.3 BLEU points over the best
single system output and 3.4 BLEU points over the state-of-the-art traditional
system combination methods.Comment: Accepted as a short paper by ACL-201
Consecutive Decoding for Speech-to-text Translation
Speech-to-text translation (ST), which directly translates the source
language speech to the target language text, has attracted intensive attention
recently. However, the combination of speech recognition and machine
translation in a single model poses a heavy burden on the direct cross-modal
cross-lingual mapping. To reduce the learning difficulty, we propose
COnSecutive Transcription and Translation (COSTT), an integral approach for
speech-to-text translation. The key idea is to generate source transcript and
target translation text with a single decoder. It benefits the model training
so that additional large parallel text corpus can be fully exploited to enhance
the speech translation training. Our method is verified on three mainstream
datasets, including Augmented LibriSpeech English-French dataset, TED
English-German dataset, and TED English-Chinese dataset. Experiments show that
our proposed COSTT outperforms the previous state-of-the-art methods. The code
is available at https://github.com/dqqcasia/st.Comment: Accepted by AAAI 2021. arXiv admin note: text overlap with
arXiv:2009.0970
"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text Translation
An end-to-end speech-to-text translation (ST) takes audio in a source
language and outputs the text in a target language. Existing methods are
limited by the amount of parallel corpus. Can we build a system to fully
utilize signals in a parallel ST corpus? We are inspired by human understanding
system which is composed of auditory perception and cognitive processing. In
this paper, we propose Listen-Understand-Translate, (LUT), a unified framework
with triple supervision signals to decouple the end-to-end speech-to-text
translation task. LUT is able to guide the acoustic encoder to extract as much
information from the auditory input. In addition, LUT utilizes a pre-trained
BERT model to enforce the upper encoder to produce as much semantic information
as possible, without extra data. We perform experiments on a diverse set of
speech translation benchmarks, including Librispeech English-French, IWSLT
English-German and TED English-Chinese. Our results demonstrate LUT achieves
the state-of-the-art performance, outperforming previous methods. The code is
available at https://github.com/dqqcasia/st.Comment: Accepted by AAAI 202
Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004
International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants