3,911 research outputs found

    Spectral Dynamics in L1 and L2 Vowel Perception

    Get PDF
    This paper presents a study of L1 and L2 vowel perception by Polish learners of English. Employing the Silent Center paradigm (e.g. Strange et al. 1983), by which listeners are presented with different portions of a vowel, a force choice identification task was carried out. Due to differences in the vowel systems of the two languages, it was hypothesized that stimulus type should have minimal effects for L1 Polish vowel perception since Polish vowels are relatively stable in quality. In L2 English, depending on proficiency level, listeners were expected to adopt a more dynamic approach to vowel identification and show higher accuracy rates on the SC tokens. That is, listeners were expected to attend more to dynamic formant cues, or vowel inherent spectral change (VISC; see e.g. Morrison and Assmann 2013) in vowel perception. Results for identification accuracy for the most part were consistent with these hypotheses. Implications of VISC for the notion of cross-language phonetic similarity, crucial to models of L2 speech acquisition, are also discussed

    Model of the Classification of English Vowels by Spanish Speakers

    Full text link
    A number of models of single language vowel classification based on formant representations have been proposed. We propose a new model that explicitly predicts vowel perception by second language (L2) learners based on the phonological map of their native language (Ll). The model represents the vowels using polar coordinates in the F l-F2 formant space. Boundaries bisect the angles made by two adjacent category centroids. An L2 vowel is classified with the closest Ll vowel with a probability based on the angular difference of the L2 vowel and the Ll vowel boundary. The polar coordinate model is compared with other vowel classification models, such as the quadratic discriminant analysis method used by Hillenbrand and Gay vert [J. Speech Hear. Research, 36, 694-700, 1993] and the logistic regression analysis method adopted by Nearey [J. Phonetics, 18, 347-373, 1990]. All models were trained on Spanish vowel data and tested on English vowels. The results were compared with behavioral data obtained by Flege [Q. J. Exp. Psych., 43 A(3), 701-731 (1991)] for Spanish monolingual speakers identifying English vowels. The polar coordinate model outperformed the other models in matching its predictions most closely with the behavioral data.National Institute on Deafness and other Communication Disorders (R29 02852); Alfred P. Sloan Foundatio

    Rhythm and Vowel Quality in Accents of English

    Get PDF
    In a sample of 27 speakers of Scottish Standard English two notoriously variable consonantal features are investigated: the contrast of /m/ and /w/ and non-prevocalic /r/, the latter both in terms of its presence or absence and the phonetic form it takes, if present. The pattern of realisation of non-prevocalic /r/ largely confirms previously reported findings. But there are a number of surprising results regarding the merger of /m/ and /w/ and the loss of non-prevocalic /r/: While the former is more likely to happen in younger speakers and females, the latter seems more likely in older speakers and males. This is suggestive of change in progress leading to a loss of the /m/ - /w/ contrast, while the variation found in non-prevocalic /r/ follows an almost inverse sociolinguistic pattern that does not suggest any such change and is additionally largely explicable in language-internal terms. One phenomenon requiring further investigation is the curious effect direct contact with Southern English accents seems to have on non-prevocalic /r/: innovation on the structural level (i.e. loss) and conservatism on the realisational level (i.e. increased incidence of [r] and [r]) appear to be conditioned by the same sociolinguistic factors

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Speech characteristics of monozygotic twins and a same-sex sibling: an acoustic case study of coarticulation patterns in read speech

    Get PDF
    This case study reports on an acoustic investigation of the motor speech characteristics of a set of young adult male monozygotic (MZ) twins and compares them to those of an age- and sex-matched sibling who participated in the study 2 years later to match for demographic factors. Coarticulation patterns were investigated from read samples of consonant-vowel sequences in monosyllabic words containing a variety of consonants and vowels. This was done by examining F2 vowel onsets and F2 vowel targets, plotted as F2 locus equations. Data were processed for between-sibling differences using a number of statistical tests. Results indicated that the MZ twins displayed F2 parameters and coarticulation patterns which were more similar than those of their age- and sex-matched sibling. The results of this case study therefore suggest that acoustic phonetic parameters used to index coarticulation patterns have the potential to profile some of the similarities and differences in the speech characteristics of genetically related individuals

    Production and perception of English Word Final Stops By Malay Speakers

    Get PDF
    A few influential speech studies have been carried out using established speech learning models, which confirmed that the analysis of first language (L1) and second language (L2) at a phonemic level provides only a partial view of deeper relationships between languages in contact. Therefore, studies focusing on cross-language phonetic differences as a causative factor in L2 learner difficulties have been proposed to understand second language learners’ (L2) speech production and how listeners respond perceptually to the phonetic properties of L2. This paper presents a study of the production and perception of the final stops by English learners (L2) whose first language is Malay (L1). A total of 23 students, comprising 16 male and 7 female Malay subjects (L1 as Malay and their L2 as English) with normal hearing and speech development participated in this study. A short interview was conducted in order to gain background information about information about each subject, to introduce them to the study, to inform them about the process of recording, the materials to be used in the recording session, and how the materials should be managed during recording time. Acoustic measurements of selected segments occurring in word final positions (via spectrographic analysis, syllable rhyme duration and phonation) were taken. Results of the voicing contrast realisation in Malay accented English and Malaysian listeners' perceptual identification/discrimination abilities with final voiced/voiceless stops in Malay and English are presented and discussed. The findings revealed that the Malay students’ realisation of final stops in L2 is largely identical to their L1. In addition, the results also showed that accurate ‘perception’ may not always lead to accurate ‘production’

    Effect of formant frequency spacing on perceived gender in pre-pubertal children's voices

    Get PDF
    <div><p>Background</p><p>It is usually possible to identify the sex of a pre-pubertal child from their voice, despite the absence of sex differences in fundamental frequency at these ages. While it has been suggested that the overall spacing between formants (formant frequency spacing - ΔF) is a key component of the expression and perception of sex in children's voices, the effect of its continuous variation on sex and gender attribution has not yet been investigated.</p><p>Methodology/Principal findings</p><p>In the present study we manipulated voice ΔF of eight year olds (two boys and two girls) along continua covering the observed variation of this parameter in pre-pubertal voices, and assessed the effect of this variation on adult ratings of speakers' sex and gender in two separate experiments. In the first experiment (sex identification) adults were asked to categorise the voice as either male or female. The resulting identification function exhibited a gradual slope from male to female voice categories. In the second experiment (gender rating), adults rated the voices on a continuum from “masculine boy” to “feminine girl”, gradually decreasing their masculinity ratings as ΔF increased.</p><p>Conclusions/Significance</p><p>These results indicate that the role of ΔF in voice gender perception, which has been reported in adult voices, extends to pre-pubertal children's voices: variation in ΔF not only affects the perceived sex, but also the perceived masculinity or femininity of the speaker. We discuss the implications of these observations for the expression and perception of gender in children's voices given the absence of anatomical dimorphism in overall vocal tract length before puberty.</p></div

    Changes in the McGurk Effect Across Phonetic Contexts

    Full text link
    To investigate the process underlying audiovisual speech perception, the McGurk illusion was examined across a range of phonetic contexts. Two major changes were found. First, the frequency of illusory /g/ fusion percepts increased relative to the frequency of illusory /d/ fusion percepts as vowel context was shifted from /i/ to /a/ to /u/. This trend could not be explained by biases present in perception of the unimodal visual stimuli. However, the change found in the McGurk fusion effect across vowel environments did correspond systematically with changes in second format frequency patterns across contexts. Second, the order of consonants in illusory combination percepts was found to depend on syllable type. This may be due to differences occuring across syllable contexts in the timecourses of inputs from the two modalities as delaying the auditory track of a vowel-consonant stimulus resulted in a change in the order of consonants perceived. Taken together, these results suggest that the speech perception system either fuses audiovisual inputs into a visually compatible percept with a similar second formant pattern to that of the acoustic stimulus or interleaves the information from different modalities, at a phonemic or subphonemic level, based on their relative arrival times.National Institutes of Health (R01 DC02852
    corecore