47 research outputs found

    IMPROVING MULTIPLE-CROWD-SOURCED TRANSCRIPTIONS USING A SPEECH RECOGNISER

    Get PDF
    ABSTRACT This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21 % relative

    Sherris?, Dorothy Tindal and Michael Terry at Hill Cottage, Armidale, New South Wales, May 1922/

    No full text
    Title devised by cataloguer from accompanying information.; Part of the collection: Michael Terry collection of negatives of his expeditions and travels, 1918-1971.; Condition: Loss.; Also available online at: http://nla.gov.au/nla.pic-vn6248470; Also available as a photograph: PIC Album 866

    Rule-based grapheme-to-phoneme method for the Greek

    No full text
    This paper describes a trainable method for generating letter to sound rules for the Greek language, for producing the pronunciation of out-of-vocabulary words. Several approaches have been adopted over the years for grapheme-to-phoneme conversion, such as hand-seeded rules, finite state transducers, neural networks, HMMs etc, nevertheless it has been proved that the most reliable method is a rule-based one. Our approach is based on a semi-automatically pre-transcribed lexicon, from which we derived rules for automatic transcription. The efficiency and robustness of our method are proved by experiments on out-of-vocabulary words which resulted in over than 98% accuracy on a word-base criterion

    Short-time instantaneous frequency and bandwidth features for speech recognition

    No full text

    Spectral moment features augmented by low order cepstral coefficients for robust ASR

    No full text
    corecore