Search CORE

47 research outputs found

Recommended from our members

Improving multiple-crowd-sourced transcriptions using a speech recogniser

Author: Gales MJF
Knill KM
Tsiakoulis P
Van Dalen RC
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 01/04/2015
Field of study

This paper introduces a method to produce high-quality transcrip- tions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low qual- ity. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21 % relative.This paper reports on research supported by Cambridge English, University of Cambridge.This is the accepted manuscript of a paper that will be published in the Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. It is currently under an infinite embargo

Apollo (Cambridge)

IMPROVING MULTIPLE-CROWD-SOURCED TRANSCRIPTIONS USING A SPEECH RECOGNISER

Author: K M Knill
M J F Gales
P Tsiakoulis
R C Van Dalen
Publication venue
Publication date: 03/04/2020
Field of study

ABSTRACT This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21 % relative

CiteSeerX

On the Effect of Fundamental Frequency on Amplitude and Frequency Modulation Patterns in Speech Resonances

Author: Potamianos A
Tsiakoulis P
Publication venue
Publication date: 01/01/2010
Field of study

CUED - Cambridge University Engineering Department

On the effect of fundamental frequency on amplitude and frequency modulation patterns in speech resonances

Author: Potamianos A
Tsiakoulis P
Publication venue
Publication date: 01/01/2010
Field of study

DSpace@NTUA (National Technical Univ. of Athens)

STATISTICAL ANALYSIS OF AMPLITUDE MODULATION IN SPEECH SIGNALS USING AN AM-FM MODEL

Author: Potamianos A
Tsiakoulis P
Publication venue
Publication date: 01/01/2009
Field of study

CUED - Cambridge University Engineering Department

Sherris?, Dorothy Tindal and Michael Terry at Hill Cottage, Armidale, New South Wales, May 1922/

Author: Potamianos A
Tsiakoulis P
Publication venue
Publication date: 01/01/1922
Field of study

Title devised by cataloguer from accompanying information.; Part of the collection: Michael Terry collection of negatives of his expeditions and travels, 1918-1971.; Condition: Loss.; Also available online at: http://nla.gov.au/nla.pic-vn6248470; Also available as a photograph: PIC Album 866

CiteSeerX

Crossref

National Library of Australia Digital Object Repository

DSpace@NTUA (National Technical Univ. of Athens)

Rule-based grapheme-to-phoneme method for the Greek

Author: Chalamandaris A
Raptis S
Tsiakoulis P
Publication venue
Publication date: 01/12/2005
Field of study

This paper describes a trainable method for generating letter to sound rules for the Greek language, for producing the pronunciation of out-of-vocabulary words. Several approaches have been adopted over the years for grapheme-to-phoneme conversion, such as hand-seeded rules, finite state transducers, neural networks, HMMs etc, nevertheless it has been proved that the most reliable method is a rule-based one. Our approach is based on a semi-automatically pre-transcribed lexicon, from which we derived rules for automatic transcription. The efficiency and robustness of our method are proved by experiments on out-of-vocabulary words which resulted in over than 98% accuracy on a word-base criterion

CUED - Cambridge University Engineering Department

Short-time instantaneous frequency and bandwidth features for speech recognition

Author: Dimitriadis D
Potamianos A
Tsiakoulis P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

DSpace@NTUA (National Technical Univ. of Athens)

Spectral moment features augmented by low order cepstral coefficients for robust ASR

Author: Dimitriadis D
Potamianos A
Tsiakoulis P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

DSpace@NTUA (National Technical Univ. of Athens)