356 research outputs found
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
DNN adaptation by automatic quality estimation of ASR hypotheses
In this paper we propose to exploit the automatic Quality Estimation (QE) of
ASR hypotheses to perform the unsupervised adaptation of a deep neural network
modeling acoustic probabilities. Our hypothesis is that significant
improvements can be achieved by: i)automatically transcribing the evaluation
data we are currently trying to recognise, and ii) selecting from it a subset
of "good quality" instances based on the word error rate (WER) scores predicted
by a QE component. To validate this hypothesis, we run several experiments on
the evaluation data sets released for the CHiME-3 challenge. First, we operate
in oracle conditions in which manual transcriptions of the evaluation data are
available, thus allowing us to compute the "true" sentence WER. In this
scenario, we perform the adaptation with variable amounts of data, which are
characterised by different levels of quality. Then, we move to realistic
conditions in which the manual transcriptions of the evaluation data are not
available. In this case, the adaptation is performed on data selected according
to the WER scores "predicted" by a QE component. Our results indicate that: i)
QE predictions allow us to closely approximate the adaptation results obtained
in oracle conditions, and ii) the overall ASR performance based on the proposed
QE-driven adaptation method is significantly better than the strong, most
recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201
Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English
The necessity of using a fixed-size word vocabulary in order to control the
model complexity in state-of-the-art neural machine translation (NMT) systems
is an important bottleneck on performance, especially for morphologically rich
languages. Conventional methods that aim to overcome this problem by using
sub-word or character-level representations solely rely on statistics and
disregard the linguistic properties of words, which leads to interruptions in
the word structure and causes semantic and syntactic losses. In this paper, we
propose a new vocabulary reduction method for NMT, which can reduce the
vocabulary of a given input corpus at any rate while also considering the
morphological properties of the language. Our method is based on unsupervised
morphology learning and can be, in principle, used for pre-processing any
language pair. We also present an alternative word segmentation method based on
supervised morphological analysis, which aids us in measuring the accuracy of
our model. We evaluate our method in Turkish-to-English NMT task where the
input language is morphologically rich and agglutinative. We analyze different
representation methods in terms of translation accuracy as well as the semantic
and syntactic properties of the generated output. Our method obtains a
significant improvement of 2.3 BLEU points over the conventional vocabulary
reduction technique, showing that it can provide better accuracy in open
vocabulary translation of morphologically rich languages.Comment: The 20th Annual Conference of the European Association for Machine
Translation (EAMT), Research Paper, 12 page
Who Are We Talking About? Handling Person Names in Speech Translation
Recent work has shown that systems for speech translation (ST) – similarly to automatic speech recognition (ASR) – poorly handle person names. This shortcoming does not only lead to errors that can seriously distort the meaning of the input, but also hinders the adoption of such systems in application scenarios (like computer-assisted interpreting) where the translation of named entities, like person names, is crucial. In this paper, we first analyse the outputs of ASR/ST systems to identify the reasons of failures in person name transcription/translation. Besides the frequency in the training data, we pinpoint the nationality of the referred person as a key factor. We then mitigate the problem by creating multilingual models, and further improve our ST systems by forcing them to jointly generate transcripts and translations, prioritising the former over the latter. Overall, our solutions result in a relative improvement in token-level person name accuracy by 47.8% on average for three language pairs (en->es,fr,it)
Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources
Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen
Building Multilingual Named Entity Annotated Corpora Exploiting Parallel Corpora
Proceedings of the Workshop on Annotation and
Exploitation of Parallel Corpora AEPC 2010.
Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk.
NEALT Proceedings Series, Vol. 10 (2010), 24-33.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15893
Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary
We propose a method to transfer knowledge across neural machine translation
(NMT) models by means of a shared dynamic vocabulary. Our approach allows to
extend an initial model for a given language pair to cover new languages by
adapting its vocabulary as long as new data become available (i.e., introducing
new vocabulary items if they are not included in the initial model). The
parameter transfer mechanism is evaluated in two scenarios: i) to adapt a
trained single language NMT system to work with a new language pair and ii) to
continuously add new language pairs to grow to a multilingual NMT system. In
both the scenarios our goal is to improve the translation performance, while
minimizing the training convergence time. Preliminary experiments spanning five
languages with different training data sizes (i.e., 5k and 50k parallel
sentences) show a significant performance gain ranging from +3.85 up to +13.63
BLEU in different language directions. Moreover, when compared with training an
NMT model from scratch, our transfer-learning approach allows us to reach
higher performance after training up to 4% of the total training steps.Comment: Published at the International Workshop on Spoken Language
Translation (IWSLT), 201
Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation
Simultaneous speech translation (SimulST) systems aim at generating their output with the lowest possible latency, which is normally computed in terms of Average Lagging (AL). In this paper we highlight that, despite its widespread adoption, AL provides underestimated scores for systems that generate longer predictions compared to the corresponding references. We also show that this problem has practical relevance, as recent SimulST systems have indeed a tendency to over-generate. As a solution, we propose LAAL (Length-Adaptive Average Lagging), a modified version of the metric that takes into account the over-generation phenomenon and allows for unbiased evaluation of both under-/over-generating systems
CTC-based Compression for Direct Speech Translation
Previous studies demonstrated that a dynamic phone-informed compression of
the input audio is beneficial for speech translation (ST). However, they
required a dedicated model for phone recognition and did not test this solution
for direct ST, in which a single model translates the input audio into the
target language without intermediate representations. In this work, we propose
the first method able to perform a dynamic compression of the input indirect ST
models. In particular, we exploit the Connectionist Temporal Classification
(CTC) to compress the input sequence according to its phonetic characteristics.
Our experiments demonstrate that our solution brings a 1.3-1.5 BLEU improvement
over a strong baseline on two language pairs (English-Italian and
English-German), contextually reducing the memory footprint by more than 10%.Comment: Accepted at EACL202
Does Simultaneous Speech Translation need Simultaneous Models?
In simultaneous speech translation (SimulST), finding the best trade-off
between high translation quality and low latency is a challenging task. To meet
the latency constraints posed by the different application scenarios, multiple
dedicated SimulST models are usually trained and maintained, generating high
computational costs. In this paper, motivated by the increased social and
environmental impact caused by these costs, we investigate whether a single
model trained offline can serve not only the offline but also the simultaneous
task without the need for any additional training or adaptation. Experiments on
en->{de, es} indicate that, aside from facilitating the adoption of
well-established offline techniques and architectures without affecting
latency, the offline solution achieves similar or better translation quality
compared to the same model trained in simultaneous settings, as well as being
competitive with the SimulST state of the art
- …