2 research outputs found
Detecting gross alignment errors in the Spoken British National Corpus
The paper presents methods for evaluating the accuracy of alignments between
transcriptions and audio recordings. The methods have been applied to the
Spoken British National Corpus, which is an extensive and varied corpus of
natural unscripted speech. Early results show good agreement with human ratings
of alignment accuracy. The methods also provide an indication of the location
of likely alignment problems; this should allow efficient manual examination of
large corpora. Automatic checking of such alignments is crucial when analysing
any very large corpus, since even the best current speech alignment systems
will occasionally make serious errors. The methods described here use a hybrid
approach based on statistics of the speech signal itself, statistics of the
labels being evaluated, and statistics linking the two.Comment: Four pages, 3 figures. Presented at "New Tools and Methods for
Very-Large-Scale Phonetics Research", University of Pennsylvania, January
28-31, 201
Precision of Phoneme Boundaries Derived using Hidden Markov Models
Some phoneme boundaries correspond to abrupt changes in the acoustic signal. Others are less clear-cut because the transition from one phoneme to the next is gradual. This paper compares the phoneme boundaries identified by a large number of different alignment systems, using different signal representations and Hidden Markov Model structures. The variability of the different boundaries is analysed statistically, with the boundaries grouped in terms of the broad phonetic classes of the respective phonemes. The mutual consistency between the boundaries from the various systems is analysed to identify which classes of phoneme boundary can be identified reliably by an automatic labelling system, and which are ill-defined and ambiguous. The results presented here provide a starting point for future development of techniques for objective comparisons between systems without giving undue weight to variations in those phoneme boundaries which are inherently ambiguous. Such techniques should improve the efficiency with which new alignment and HMM training algorithms can be developed. 1