Large-Scale Analysis of Formant Frequency Estimation Variability in Conversational Telephone Speech*


We quantitatively investigate how the telephone channel and regional dialect might impact formant frequencies estimates extracted from tools commonly used in law enforcement. The telephone channel and regional dialect are important factors in forensic phonetics. In 90 % of forensic cases, the speech sample in question is recorded after transmission via telephone (Byrne and Foulkes, 2004). In addition, qantifiable norms of dialect-dependent features are necessary for forensic examiners to assess if a given acoustic feature is speaker specific or commonly found in the speaker’s dialect (Rose 2002). Past studies have analyzed how the telephone channel and regional dialects might influence formant frequency estimates, but the number of speakers are often limited. (At most 20 subjects for channel studies (Byrne and Foulkes 2004; Kunzel 2001), and at most 439 speakers for American English dialects (Labov, Ash, and Boberg 2006).) To the best of our knowledge, our work is the largest scale study on these topics. Formant frequencies estimates in spontaneous conversational speech from more than 3,600 native American English speakers were extracted using default settings in Wavesurfer (Talkin 1987). We show that estimates of F1 are higher in cellular channels than those in landline, while F2 in general shows an opposite trend. We also characterized vowel shift trends in norther

