47 research outputs found
Recommended from our members
Improving multiple-crowd-sourced transcriptions using a speech recogniser
This paper introduces a method to produce high-quality transcrip-
tions of speech data from only two crowd-sourced transcriptions.
These transcriptions, produced cheaply by people on the Internet, for
example through Amazon Mechanical Turk, are often of low qual-
ity. Often, multiple crowd-sourced transcriptions are combined to
form one transcription of higher quality. However, the state of the
art is to use essentially a form of majority voting, which requires at
least three transcriptions for each utterance. This paper shows how
to refine this approach to work with only two transcriptions. It then
introduces a method that uses a speech recogniser (bootstrapped on a
simple combination scheme) to combine transcriptions. When only
two crowd-sourced transcriptions are available, on a noisy data set
this improves the word error rate to gold-standard transcriptions by
21 % relative.This paper reports on research supported by Cambridge English, University of Cambridge.This is the accepted manuscript of a paper that will be published in the Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. It is currently under an infinite embargo
IMPROVING MULTIPLE-CROWD-SOURCED TRANSCRIPTIONS USING A SPEECH RECOGNISER
ABSTRACT This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21 % relative
Sherris?, Dorothy Tindal and Michael Terry at Hill Cottage, Armidale, New South Wales, May 1922/
Title devised by cataloguer from accompanying information.; Part of the collection: Michael Terry collection of negatives of his expeditions and travels, 1918-1971.; Condition: Loss.; Also available online at: http://nla.gov.au/nla.pic-vn6248470; Also available as a photograph: PIC Album 866
Rule-based grapheme-to-phoneme method for the Greek
This paper describes a trainable method for generating letter to sound rules for the Greek language, for producing the pronunciation of out-of-vocabulary words. Several approaches have been adopted over the years for grapheme-to-phoneme conversion, such as hand-seeded rules, finite state transducers, neural networks, HMMs etc, nevertheless it has been proved that the most reliable method is a rule-based one. Our approach is based on a semi-automatically pre-transcribed lexicon, from which we derived rules for automatic transcription. The efficiency and robustness of our method are proved by experiments on out-of-vocabulary words which resulted in over than 98% accuracy on a word-base criterion