108,213 research outputs found
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics
A crucial step in processing speech audio data for information extraction,
topic detection, or browsing/playback is to segment the input into sentence and
topic units. Speech segmentation is challenging, since the cues typically
present for segmenting text (headers, paragraphs, punctuation) are absent in
spoken language. We investigate the use of prosody (information gleaned from
the timing and melody of speech) for these tasks. Using decision tree and
hidden Markov modeling techniques, we combine prosodic cues with word-based
approaches, and evaluate performance on two speech corpora, Broadcast News and
Switchboard. Results show that the prosodic model alone performs on par with,
or better than, word-based statistical language models -- for both true and
automatically recognized words in news speech. The prosodic model achieves
comparable performance with significantly less training data, and requires no
hand-labeling of prosodic events. Across tasks and corpora, we obtain a
significant improvement over word-only models using a probabilistic combination
of prosodic and lexical information. Inspection reveals that the prosodic
models capture language-independent boundary indicators described in the
literature. Finally, cue usage is task and corpus dependent. For example, pause
and pitch features are highly informative for segmenting news speech, whereas
pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2),
Special Issue on Accessing Information in Spoken Audio, September 200
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Topic and background knowledge effects on performance in speaking assessment
This study explores the extent to which topic and background knowledge of topic affect spoken
performance in a high-stakes speaking test. It is argued that evidence of a substantial influence may introduce construct-irrelevant variance and undermine test fairness. Data were collected from 81 non-native speakers of English who performed on 10 topics across three task types. Background knowledge and general language proficiency were measured using self-report questionnaires and C-tests respectively. Score data were analysed using many-facet Rasch measurement and multiple regression. Findings showed that for two of the three task types, the topics used in the study generally exhibited difficulty measures which were statistically distinct. However, the size of the differences in topic difficulties was too small to have a large practical effect on scores. Participants’ different levels of background knowledge were shown to have a systematic effect on performance. However, these statistically significant differences also failed to translate into practical significance. Findings hold implications for speaking performance assessment
- …