68 research outputs found
Average Token Delay: A Duration-aware Latency Metric for Simultaneous Translation
Simultaneous translation is a task in which the translation begins before the
end of an input speech segment. Its evaluation should be conducted based on
latency in addition to quality, and for users, the smallest possible amount of
latency is preferable. Most existing metrics measure latency based on the start
timings of partial translations and ignore their duration. This means such
metrics do not penalize the latency caused by long translation output, which
delays the comprehension of users and subsequent translations. In this work, we
propose a novel latency evaluation metric for simultaneous translation called
\emph{Average Token Delay} (ATD) that focuses on the duration of partial
translations. We demonstrate its effectiveness through analyses simulating
user-side latency based on Ear-Voice Span (EVS). In our experiment, ATD had the
highest correlation with EVS among baseline latency metrics under most
conditions.Comment: Extended version of the paper (doi: 10.21437/Interspeech.2023-933)
which appeared in INTERSPEECH 202
E2E Refined Dataset
Although the well-known MR-to-text E2E dataset has been used by many
researchers, its MR-text pairs include many deletion/insertion/substitution
errors. Since such errors affect the quality of MR-to-text systems, they must
be fixed as much as possible. Therefore, we developed a refined dataset and
some python programs that convert the original E2E dataset into a refined
dataset.Comment: 4 page
Towards Machine Speech-to-speech Translation
There has been a good deal of research on machine speech-to-speech translation (S2ST) in Japan, and this article presents these and our own recent research on automatic simultaneous speech translation. The S2ST system is basically composed of three modules: large vocabulary continuous automatic speech recognition (ASR), machine text-to-text translation (MT) and text-to-speech synthesis (TTS). All these modules need to be multilingual in nature and thus require multilingual speech and corpora for training models. S2ST performance is drastically improved by deep learning and large training corpora, but many issues still still remain such as simultaneity, paralinguistics, context and situation dependency, intention and cultural dependency. This article presents current on-going research and discusses issues with a view to next-generation speech-to-speech translation.En Japón se han llevado a cabo muchas actividades de investigación acerca de la traducción automática del habla. Este artículo pretende ofrecer una visión general de dichas actividades y presentar las que se han realizado más recientemente. El sistema S2ST está formado básicamente por tres módulos: el reconocimiento automático del habla continua y de amplios vocabularios (Automatic Speech Recognition, ASR), la traducción automática de textos (Machine translation, MT) y la conversión de texto a voz (Text-to-Speech Synthesis, TTS). Todos los módulos deben ser plurilingües, por lo cual se requieren discursos y corpus multilingües para los modelos de formación. El rendimiento del sistema S2ST mejora considerablemente por medio de un aprendizaje profundo y grandes corpus formativos. Sin embargo, todavía hace falta tratar diversos aspectos, com la simultaneidad, la paralingüística, la dependencia del contexto y de la situación, la intención y la dependencia cultural. Por todo ello, repasaremos las actividades de investigación actuales y discutiremos varias cuestiones relacionadas con la traducción automática del habla de última generación.Al Japó s'han dut a terme moltes activitats de recerca sobre la traducció automàtica de la parla. Aquest article n'ofereix una visió general i presenta les activitats que s'han efectuat més recentment. El sistema S2ST es compon bàsicament de tres mòduls: el reconeixement automàtic de la parla contínua i de vocabularis extensos (Automatic Speech Recognition, ASR), la traducció automàtica de textos (Machine translation, MT) i la conversió de text a veu (Text-to-Speech Synthesis, TTS). Tots els mòduls han de ser plurilingües, per la qual cosa es requereixen discursos i corpus multilingües per als models de formació. El rendiment del sistema S2ST millora considerablement per mitjà d'un aprenentatge profund i de grans corpus formatius. Tanmateix, encara cal tractar diversos aspectes, com la simultaneïtat, la paralingüística, la dependència del context i de la situació, la intenció i la dependència cultural. Així, farem un repàs a les activitats de recerca actuals i discutirem diverses qüestions relacionades amb la traducció automàtica de la parla d'última generació
An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation
Training of neural machine translation (NMT) models usually uses mini-batches
for efficiency purposes. During the mini-batched training process, it is
necessary to pad shorter sentences in a mini-batch to be equal in length to the
longest sentence therein for efficient computation. Previous work has noted
that sorting the corpus based on the sentence length before making mini-batches
reduces the amount of padding and increases the processing speed. However,
despite the fact that mini-batch creation is an essential step in NMT training,
widely used NMT toolkits implement disparate strategies for doing so, which
have not been empirically validated or compared. This work investigates
mini-batch creation strategies with experiments over two different datasets.
Our results suggest that the choice of a mini-batch creation strategy has a
large effect on NMT training and some length-based sorting strategies do not
always work well compared with simple shuffling.Comment: 8 pages, accepted to the First Workshop on Neural Machine Translatio
- …