Search CORE

4 research outputs found

IMPROVING MULTIPLE-CROWD-SOURCED TRANSCRIPTIONS USING A SPEECH RECOGNISER

Author: K M Knill
M J F Gales
P Tsiakoulis
R C Van Dalen
Publication venue
Publication date: 03/04/2020
Field of study

ABSTRACT This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21 % relative

CiteSeerX

Lightly supervised recognition for automatic alignment of large coherent speech recordings

Author: Braunschweiler N
Buchholz S
Gales MJF
Publication venue: Curran Associates, Inc.
Publication date: 30/09/2010
Field of study

CUED - Cambridge University Engineering Department

HMM-based Speech Synthesis from Audio Book Data

Author: Haag Kathrin
Publication venue: The University of Edinburgh
Publication date: 01/10/2011
Field of study

In contrast to hand-crafted speech databases, which contain short out-of-context sentences in fairly unemphatic speech style, audio books contain rich prosody including intonation contours, pitch accents and phrasing patterns, which is a good pre-requisite for building a natural sounding synthetic voice. The following paper will give an overview of the steps that are involved in building a synthetic voice from audio book data. After an introduction to the theory of HMM-based speech synthesis, the properties of the speech database will be described in detail. It will be argued that it is necessary to model specific properties of the database, such as higher pitched speech or questions, to achieve a better quality synthetic voice. Furthermore, the acoustic modelling of these properties will be explained in detail. Finally, the synthetic voice is evaluated on the basis of an online listening test

Edinburgh Research Archive