2 research outputs found

    Exploring British Accents: Modeling the Trap-Bath Split with Functional Data Analysis

    Get PDF
    The sound of our speech is influenced by the places we come from. Great Britain contains a wide variety of distinctive accents which are of interest to linguistics. In particular, the "a" vowel in words like "class" is pronounced differently in the North and the South. Speech recordings of this vowel can be represented as formant curves or as Mel-frequency cepstral coefficient curves. Functional data analysis and generalized additive models offer techniques to model the variation in these curves. Our first aim is to model the difference between typical Northern and Southern vowels, by training two classifiers on the North-South Class Vowels dataset collected for this paper (Koshy 2020). Our second aim is to visualize geographical variation of accents in Great Britain. For this we use speech recordings from a second dataset, the British National Corpus (BNC) audio edition (Coleman et al. 2012). The trained models are used to predict the accent of speakers in the BNC, and then we model the geographical patterns in these predictions using a soap film smoother. This work demonstrates a flexible and interpretable approach to modeling phonetic accent variation in speech recordings.Comment: 36 pages, 24 figure

    MAP Prediction of Formant Frequencies and Voicing Class from MFCC Vectors in Noise

    No full text
    Novel methods are presented for predicting formant frequencies and voicing class from mel-frequency cepstral coefficients (MFCCs). It is shown how Gaussian mixture models (GMMs) can be used to model the relationship between formant frequencies and MFCCs. Using such models and an input MFCC vector, a maximum a posteriori (MAP) prediction of formant frequencies can be made. The specific relationship each speech sound has between MFCCs and formant frequencies is exploited by using state-specific GMMs within a framework of a set of hidden Markov models (HMMs). Formant prediction accuracy and voicing prediction of speaker-independent male speech are evaluated on both a constrained vocabulary connected digits database and a large vocabulary database. Experimental results show that for HMM–GMM prediction on the connected digits database, voicing class prediction error is less than 3.5%. Less than 1.8% of frames have formant frequency percentage errors greater than 20% and the mean percentage error of the remaining frames is less than 3.7%. Further experiments show prediction accuracy under noisy conditions. For example, at a signal-to-noise ratio (SNR) of 0 dB, voicing class prediction error increases to 9.4%, less than 4.3% of frames have formant frequency percentage errors over 20% and the formant frequency percentage error for the remaining frames is less than 5.7%
    corecore