352 research outputs found
Speaker matters: Natural inter-speaker variation affects 4-month-oldsâ perception of audio-visual speech
First Published September 27, 2019In the language development literature, studies often make inferences about infantsâ speech perception abilities based on their responses to a single speaker. However, there can be significant natural variability across speakers in how speech is produced (i.e., inter-speaker differences). The current study examined whether inter-speaker differences can affect infantsâ ability to detect a mismatch between the auditory and visual components of vowels. Using an eye-tracker, 4.5-month-old infants were tested on auditory-visual (AV) matching for two vowels (/i/ and /u/). Critically, infants were tested with two speakers who naturally differed in how distinctively they articulated the two vowels within and across the categories. Only infants who watched and listened to the speaker whose visual articulations of the two vowels were most distinct from one another were sensitive to AV mismatch. This speaker also produced a visually more distinct /i/ as compared to the other speaker. This finding suggests that infants are sensitive to the distinctiveness of AV information across speakers, and that when making inferences about infantsâ perceptual abilities, characteristics of the speaker should be taken into account.The author(s) disclosed receipt of the following financial support for the research, authorship and/
or publication of this article: This research was funded by the grant PSI2014-5452-P from the
Spanish Ministry of Economy and Competitiveness to M.M. The authors also acknowledge
financial support from the âSevero Ochoa Program for Centers/Units of Excellence in R&Dâ
(SEV-2015-490) and from the Basque Government âPrograma Predoctoralâ to J.P
Emergence of the vowel space in very young children with Down syndrome: An exploratory case study
The current study presents the preliminary results of an investigation into the development of the vowel space in one female child with Down syndrome (DS). Vowel productions at five points in time, ranging from 1;0 to 3;8 years of age, have been analysed to produce age-specific F1-F2 vowel plots and to calculate metrics quantifying changes in their size and dimensions. The results show that changes in DS vowel space area and shape are non-systematic, lacking the definite developmental trajectories present in the productions of typically developing children. An explanation of outcomes using the DIVA model of speech acquisition is proposed
A computational model of the relationship between speech intelligibility and speech acoustics
abstract: Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201
Recommended from our members
The Organization of Lexicons: a Cross-Linguistic Analysis of Monosyllabic Words
Lexicons utilize a fraction of licit structures. Different theories predict either that lexicons prioritize contrastiveness or structural economy. Study 1 finds that the monosyllabic lexicon of Mandarin is no more distinctive than a randomly sampled baseline using the phonological inventory. Study 2 finds that the lexicons of Mandarin and American English have fewer phonotactically complex words than the random baseline: Words tend not to have multiple low-probability components. This suggests that phonological constraints can have superadditive penalties for combined violations, consistent with e.g. Albright (ms.)
The production and perception of peripheral geminate/singleton coronal stop contrasts in Arabic
Gemination is typologically common word-medially but is rare at the periphery of the word (word-initially and -finally). In line with this observation, prior research on production and perception of gemination has focused primarily on medial gemination. Much less is known about the production and perception of peripheral gemination. This PhD thesis reports on comprehensive articulatory, acoustic and perceptual investigations of geminate-singleton contrasts according to the position of the contrast in the word and in the utterance. The production component of the project investigated the articulatory and acoustic features of medial and peripheral gemination of voiced and voiceless coronal stops in Modern standard Arabic and regional Arabic vernacular dialects, as produced by speakers from two disparate and geographically distant countries, Morocco and Lebanon. The perceptual experiment investigated how standard and dialectal Arabic gemination contrasts in each word position were categorised and discriminated by three groups of non-native listeners, each differing in their native language experience with gemination at different word positions. The first experiment used ultrasound and acoustic recordings to address the extent to which word-initial gemination in Moroccan and Lebanese dialectal Arabic is maintained, as well as the articulatory and acoustic variability of the contrast according to the position of the gemination contrast in the utterance (initial vs. medial) and between the two dialects. The second experiment compared the production of word-medial and -final gemination in Modern Standard Arabic as produced by Moroccan and Lebanese speakers. The aim of the perceptual experiment was to disentangle the contribution of phonological and phonetic effects of the listenersâ native languages on the categorisation and discrimination of non-lexical Moroccan gemination by three groups of non-native listeners varying in their phonological (native Lebanese group and heritage Lebanese group, for whom Moroccan is unintelligible, i.e., non-native language) and phonetic-only (native English group) experience with gemination across the three word positions. The findings in this thesis constitute important contributions about positional and dialectal effects on the production and perception of gemination contrasts, going beyond medial gemination (which was mainly included as control) and illuminating in particular the typologically rare peripheral gemination
Analysis of speech and tongue motion in normal and post-glossectomy speaker using cine MRI
Objective Since the tongue is the oral structure responsible for mastication, pronunciation, and swallowing functions, patients who undergo glossectomy can be affected in various aspects of these functions. The vowel /i/ uses the tongue shape, whereas /u/ uses tongue and lip shapes. The purpose of this study is to investigate the morphological changes of the tongue and the adaptation of pronunciation using cine MRI for speech of patients who undergo glossectomy. Material and Methods Twenty-three controls (11 males and 12 females) and 13 patients (eight males and five females) volunteered to participate in the experiment. The patients underwent glossectomy surgery for T1 or T2 lateral lingual tumors. The speech tasks âa soukâ and âa geeseâ were spoken by all subjects providing data for the vowels /u/ and /i/. Cine MRI and speech acoustics were recorded and measured to compare the changes in the tongue with vowel acoustics after surgery. 2D measurements were made of the interlip distance, tongue-palate distance, tongue position (anterior-posterior and superior-inferior), tongue height on the left and right sides, and pharynx size. Vowel formants Fl, F2, and F3 were measured. Results The patients had significantly lower F2/Fl ratios (F=5.911, p=0.018), and lower F3/F1 ratios that approached significance. This was seen primarily in the /u/ data. Patients had flatter tongue shapes than controls with a greater effect seen in /u/ than /i/. Conclusion The patients showed complex adaptation motion in order to preserve the acoustic integrity of the vowels, and the tongue modified cavity size relationships to maintain the value of the formant frequencies
The articulatory and acoustic characteristics of Polish sibilants and their consequences for diachronic change
The study is concerned with the relative synchronic stability of three contrastive sibilant fricatives /s (sic)/ in Polish. Tongue movement data were collected from nine first-language Polish speakers producing symmetrical real and non-word CVCV sequences in three vowel contexts. A Gaussian model was used to classify the sibilants from spectral information in the noise and from formant frequencies at vowel onset. The physiological analysis showed an almost complete separation between /s (sic)/ on tongue-tip parameters. The acoustic analysis showed that the greater energy at higher frequencies distinguished /s/ in the fricative noise from the other two sibilant categories. The most salient information at vowel onset was for /(sic)/, which also had a strong palatalizing effect on the following vowel. Whereas either the noise or vowel onset was largely sufficient for the identification of /s (sic)/ respectively, both sets of cues were necessary to separate /(sic)/ from /s (sic)/. The greater synchronic instability of /(sic)/ may derive from its high articulatory complexity coupled with its comparatively low acoustic salience. The data also suggest that the relatively late stage of /(sic)/ acquisition by children may come about because of the weak acoustic information in the vowel for its distinction from /s/
- âŠ