8,733 research outputs found
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
Referenceless Quality Estimation for Natural Language Generation
Traditional automatic evaluation measures for natural language generation
(NLG) use costly human-authored references to estimate the quality of a system
output. In this paper, we propose a referenceless quality estimation (QE)
approach based on recurrent neural networks, which predicts a quality score for
a NLG system output by comparing it to the source meaning representation only.
Our method outperforms traditional metrics and a constant baseline in most
respects; we also show that synthetic data helps to increase correlation
results by 21% compared to the base system. Our results are comparable to
results obtained in similar QE tasks despite the more challenging setting.Comment: Accepted as a regular paper to 1st Workshop on Learning to Generate
Natural Language (LGNL), Sydney, 10 August 201
Exploring Prediction Uncertainty in Machine Translation Quality Estimation
Machine Translation Quality Estimation is a notoriously difficult task, which
lessens its usefulness in real-world translation environments. Such scenarios
can be improved if quality predictions are accompanied by a measure of
uncertainty. However, models in this task are traditionally evaluated only in
terms of point estimate metrics, which do not take prediction uncertainty into
account. We investigate probabilistic methods for Quality Estimation that can
provide well-calibrated uncertainty estimates and evaluate them in terms of
their full posterior predictive distributions. We also show how this posterior
information can be useful in an asymmetric risk scenario, which aims to capture
typical situations in translation workflows.Comment: Proceedings of CoNLL 201
Sentence-level quality estimation for MT system combination
This paper provides the system description of the Dublin City University system combination module for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimize the Division of Labour in Hybrid MT (ML4HMT- 12). We incorporated a sentence-level quality score, obtained by sentence-level Quality Estimation (QE), as meta information guiding system combination. Instead of using BLEU or (minimum average) TER, we select a backbone for the confusion network using the estimated quality score. For the Spanish-English data, our strategy improved 0.89 BLEU points absolute compared to the best single score and 0.20 BLEU points absolute compared to the standard system combination strateg
- …