4,209 research outputs found
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Simultaneous speech translation (SimulST) is a challenging task aiming to
translate streaming speech before the complete input is observed. A SimulST
system generally includes two components: the pre-decision that aggregates the
speech information and the policy that decides to read or write. While recent
works had proposed various strategies to improve the pre-decision, they mainly
adopt the fixed wait-k policy, leaving the adaptive policies rarely explored.
This paper proposes to model the adaptive policy by adapting the Continuous
Integrate-and-Fire (CIF). Compared with monotonic multihead attention (MMA),
our method has the advantage of simpler computation, superior quality at low
latency, and better generalization to long utterances. We conduct experiments
on the MuST-C V2 dataset and show the effectiveness of our approach.Comment: Submitted to INTERSPEECH 202
Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation
Boosted by the simultaneous translation shared task at IWSLT 2020, promising
end-to-end online speech translation approaches were recently proposed. They
consist in incrementally encoding a speech input (in a source language) and
decoding the corresponding text (in a target language) with the best possible
trade-off between latency and translation quality. This paper investigates two
key aspects of end-to-end simultaneous speech translation: (a) how to encode
efficiently the continuous speech flow, and (b) how to segment the speech flow
in order to alternate optimally between reading (R: encoding input) and writing
(W: decoding output) operations. We extend our previously proposed end-to-end
online decoding strategy and show that while replacing BLSTM by ULSTM encoding
degrades performance in offline mode, it actually improves both efficiency and
performance in online mode. We also measure the impact of different methods to
segment the speech signal (using fixed interval boundaries, oracle word
boundaries or randomly set boundaries) and show that our best end-to-end online
decoding strategy is surprisingly the one that alternates R/W operations on
fixed size blocks on our English-German speech translation setup.Comment: Accepted for presentation at Interspeech 202
Visualization: the missing factor in Simultaneous Speech Translation
Simultaneous speech translation (SimulST) is the task in which output
generation has to be performed on partial, incremental speech input. In recent
years, SimulST has become popular due to the spread of cross-lingual
application scenarios, like international live conferences and streaming
lectures, in which on-the-fly speech translation can facilitate users' access
to audio-visual content. In this paper, we analyze the characteristics of the
SimulST systems developed so far, discussing their strengths and weaknesses. We
then concentrate on the evaluation framework required to properly assess
systems' effectiveness. To this end, we raise the need for a broader
performance analysis, also including the user experience standpoint. SimulST
systems, indeed, should be evaluated not only in terms of quality/latency
measures, but also via task-oriented metrics accounting, for instance, for the
visualization strategy adopted. In light of this, we highlight which are the
goals achieved by the community and what is still missing.Comment: Accepted at CLIC-it 202
A Discussion on Building Practical NLP Leaderboards: The Case of Machine Translation
Recent advances in AI and ML applications have benefited from rapid progress
in NLP research. Leaderboards have emerged as a popular mechanism to track and
accelerate progress in NLP through competitive model development. While this
has increased interest and participation, the over-reliance on single, and
accuracy-based metrics have shifted focus from other important metrics that
might be equally pertinent to consider in real-world contexts. In this paper,
we offer a preliminary discussion of the risks associated with focusing
exclusively on accuracy metrics and draw on recent discussions to highlight
prescriptive suggestions on how to develop more practical and effective
leaderboards that can better reflect the real-world utility of models.Comment: pre-print: comments and suggestions welcom
Learning Coupled Policies for Simultaneous Machine Translation using Imitation Learning
We present a novel approach to efficiently learn a simultaneous translation
model with coupled programmer-interpreter policies. First, wepresent an
algorithmic oracle to produce oracle READ/WRITE actions for training bilingual
sentence-pairs using the notion of word alignments. This oracle actions are
designed to capture enough information from the partial input before writing
the output. Next, we perform a coupled scheduled sampling to effectively
mitigate the exposure bias when learning both policies jointly with imitation
learning. Experiments on six language-pairs show our method outperforms strong
baselines in terms of translation quality while keeping the translation delay
low.Comment: 9 page
- …