14,032 research outputs found

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Time and information in perceptual adaptation to speech

    Get PDF
    Presubmission manuscript and supplementary files (stimuli, stimulus presentation code, data, data analysis code).Perceptual adaptation to a talker enables listeners to efficiently resolve the many-to-many mapping between variable speech acoustics and abstract linguistic representations. However, models of speech perception have not delved into the variety or the quantity of information necessary for successful adaptation, nor how adaptation unfolds over time. In three experiments using speeded classification of spoken words, we explored how the quantity (duration), quality (phonetic detail), and temporal continuity of talker-specific context contribute to facilitating perceptual adaptation to speech. In single- and mixed-talker conditions, listeners identified phonetically-confusable target words in isolation or preceded by carrier phrases of varying lengths and phonetic content, spoken by the same talker as the target word. Word identification was always slower in mixed-talker conditions than single-talker ones. However, interference from talker variability decreased as the duration of preceding speech increased but was not affected by the amount of preceding talker-specific phonetic information. Furthermore, efficiency gains from adaptation depended on temporal continuity between preceding speech and the target word. These results suggest that perceptual adaptation to speech may be understood via models of auditory streaming, where perceptual continuity of an auditory object (e.g., a talker) facilitates allocation of attentional resources, resulting in more efficient perceptual processing.NIH NIDCD (R03DC014045

    Asymmetric discrimination of non-speech tonal analogues of vowels

    Full text link
    Published in final edited form as: J Exp Psychol Hum Percept Perform. 2019 February ; 45(2): 285–300. doi:10.1037/xhp0000603.Directional asymmetries reveal a universal bias in vowel perception favoring extreme vocalic articulations, which lead to acoustic vowel signals with dynamic formant trajectories and well-defined spectral prominences due to the convergence of adjacent formants. The present experiments investigated whether this bias reflects speech-specific processes or general properties of spectral processing in the auditory system. Toward this end, we examined whether analogous asymmetries in perception arise with non-speech tonal analogues that approximate some of the dynamic and static spectral characteristics of naturally-produced /u/ vowels executed with more versus less extreme lip gestures. We found a qualitatively similar but weaker directional effect with two-component tones varying in both the dynamic changes and proximity of their spectral energies. In subsequent experiments, we pinned down the phenomenon using tones that varied in one or both of these two acoustic characteristics. We found comparable asymmetries with tones that differed exclusively in their spectral dynamics, and no asymmetries with tones that differed exclusively in their spectral proximity or both spectral features. We interpret these findings as evidence that dynamic spectral changes are a critical cue for eliciting asymmetries in non-speech tone perception, but that the potential contribution of general auditory processes to asymmetries in vowel perception is limited.Accepted manuscrip

    Transfer, similarity or lack of awareness? inconsistencies of German learners in the pronunciation of lot, thought, strut, palm and bath

    Get PDF
    The current study presents acoustic analyses of non-high back vowels and low central vowels in the lexical sets LOT, THOUGHT, STRUT, PALM and BATH as pronounced by German learners of English. The main objective is to show that learners of English at university level are highly inconsistent in approximating the vowels of their self-chosen target accents British English (BrE) and American English (AmE). To that end, the acoustic qualities of the English vowels of learners are compared to their native German vowels and to the vowels of native speakers of BrE and AmE. In order to facilitate statements about the effect of increased experience, the study differentiates between students in their first year at university and in their third year or later. The results obtained are highly variable: In some cases the learners transfer their L1 vowels to English, other cases show clear approximations to the target vowels, while other cases again document the production of new vowels neither found in German nor in English. However, close approximation to the target vowels only sometimes correlates with higher proficiency. This might be an indicator of a low level of awareness of systematic differences between the BrE and AmE vowel systems. But the data also indicate that the more advanced learners produce more distinct AmE BATH vowels and BrE THOUGHT vowels than the less advanced learners, which points to a partial increase of awareness resulting from increased experience. All in all it seems that raising the awareness of differences between target accents in L2 instruction is necessary if the envisage goal is for learners to reach near-native pronunciation

    Cross-Linguistic Influence in the Bilingual Mental Lexicon: Evidence of Cognate Effects in the Phonetic Production and Processing of a Vowel Contrast.

    Get PDF
    The present study examines cognate effects in the phonetic production and processing of the Catalan back mid-vowel contrast (/o/-/ɔ/) by 24 early and highly proficient Spanish-Catalan bilinguals in Majorca (Spain). Participants completed a picture-naming task and a forced-choice lexical decision task in which they were presented with either words (e.g., /bɔsk/ "forest") or non-words based on real words, but with the alternate mid-vowel pair in stressed position ((*)/bosk/). The same cognate and non-cognate lexical items were included in the production and lexical decision experiments. The results indicate that even though these early bilinguals maintained the back mid-vowel contrast in their productions, they had great difficulties identifying non-words and real words based on the identity of the Catalan mid-vowel. The analyses revealed language dominance and cognate effects: Spanish-dominants exhibited higher error rates than Catalan-dominants, and production and lexical decision accuracy were also affected by cognate status. The present study contributes to the discussion of the organization of early bilinguals' dominant and non-dominant sound systems, and proposes that exemplar theoretic approaches can be extended to include bilingual lexical connections that account for the interactions between the phonetic and lexical levels of early bilingual individuals

    Production and perception of speaker-specific phonetic detail at word boundaries

    Get PDF
    Experiments show that learning about familiar voices affects speech processing in many tasks. However, most studies focus on isolated phonemes or words and do not explore which phonetic properties are learned about or retained in memory. This work investigated inter-speaker phonetic variation involving word boundaries, and its perceptual consequences. A production experiment found significant variation in the extent to which speakers used a number of acoustic properties to distinguish junctural minimal pairs e.g. 'So he diced them'—'So he'd iced them'. A perception experiment then tested intelligibility in noise of the junctural minimal pairs before and after familiarisation with a particular voice. Subjects who heard the same voice during testing as during the familiarisation period showed significantly more improvement in identification of words and syllable constituents around word boundaries than those who heard different voices. These data support the view that perceptual learning about the particular pronunciations associated with individual speakers helps listeners to identify syllabic structure and the location of word boundaries

    Forming New Vowel Categories in Second Language Speech: The Case of Polish Learners' Production of English /I/ and /e/

    Get PDF
    The paper concentrates on formation of L2 English vowel categories in the speech of Polish learners. More specifically, it compares distribution of two English categories - /I/ and /e/ relative to neighbouring Polish vowels. 43 participants recorded Polish and English vowels in a /bVt/ context. First two formants were measured at a vowel midpoint and plotted on a vowel plane. The results reveal that while a separate /I/ category is formed fairly effectively in Polish learners pronunciation of English, a category of /e/ is almost completely subsumed by a Polish vowel /ϵ

    Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

    Get PDF
    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

    Effects of Palatal Expansion on Speech Production

    Get PDF
    Introduction: Rapid palatal expanders (RPEs) are a commonly used orthodontic adjunct for the treatment of posterior crossbites. RPEs are cemented to bilateral posterior teeth across the palate and thus may interfere with proper tongue movement and linguopalatal contact. The purpose of this study was to identify what specific role RPEs have on speech sound production for the child and early adolescent orthodontic patient. Materials and Methods: RPEs were treatment planned for patients seeking orthodontics at Marquette University. Speech recordings were made using a phonetically balanced reading passage (“The Caterpillar”) at 3 time points: 1) before RPE placement; 2) immediately after cementation; and 3) 10-14 days post appliance delivery. Measures of vocal tract resonance (formant center frequencies) were obtained for vowels and measures of noise distribution (spectral moments) were obtained for consonants. Two-way repeated measures (ANOVA) was used along with post-hoc tests for statistical analysis. Results: For the vowel /i/, the first formant increased and the second formant decreased indicating a more inferior and posterior tongue position. For /e/, only the second formant decreased resulting in a more posterior tongue position. The formants did not return to baseline within the two-week study period. For the fricatives /s/, //, /t/, and /k/, a significant shift from high to low frequencies indicated distortion upon appliance placement. Of these, only /t/ fully returned to baseline during the study period. Conclusion: Numerous phonemes were distorted upon RPE placement which indicated altered speech sound production. For most phonemes, it takes longer than two weeks for speech to return to baseline, if at all. Clinically, the results of this study will help with pre-treatment and interdisciplinary counseling for orthodontic patients receiving palatal expanders

    Responding to accents after experiencing interactive or mediated speech

    Get PDF
    Very little known is about how speakers learn about and/or respond to speech experienced without the possibility for interaction. This paper reports an experiment which considers the effects of two kinds of exposure to speech (interactive or non-interactive mediated) on Scottish English speakers’ responses to another accent (Southern British English), for two processing tasks, phonological awareness and speech production. Only marginal group effects are found according to exposure type. The main findings show a difference between subjects according to exposure type before exposure, and individual shifts in responses to speech according to exposure type
    corecore