55 research outputs found

    Acoustic Space Movement Planning in a Neural Model of Motor Equivalent Vowel Production

    Full text link
    Recent evidence suggests that speakers utilize an acoustic-like reference frame for the planning of speech movements. DIVA, a computational model of speech acquisition and motor equivalent speech production, has previously been shown to provide explanations for a wide range of speech production data using a constriction-based reference frame for movement planning. This paper extends the previous work by investigating an acoustic-like planning frame in the DIVA modeling framework. During a babbling phase, the model self-organizes targets in the planning space for each of ten vowels and learns a mapping from desired movement directions in this planning space into appropriate articulator velocities. Simulation results verify that after babbling the model is capable of producing easily recognizable vowel sounds using an acoustic planning space consisting of the formants F1 and F2. The model successfully reaches all vowel targets from any initial vocal tract configuration, even in the presence of constraints such as a blocked jaw.Office of Naval Research (N00014-91-J-4100, N00014-92-J-4015); Air Force Office of Scientific Research (F49620-92-J-0499

    Articulatory Tradeoffs Reduce Acoustic Variability During American English /r/ Production

    Full text link
    Acoustic and articulatory recordings reveal that speakers utilize systematic articulatory tradeoffs to maintain acoustic stability when producing the phoneme /r/. Distinct articulator configurations used to produce /r/ in various phonetic contexts show systematic tradeoffs between the cross-sectional areas of different vocal tract sections. Analysis of acoustic and articulatory variabilities reveals that these tradeoffs act to reduce acoustic variability, thus allowing large contextual variations in vocal tract shape; these contextual variations in turn apparently reduce the amount of articulatory movement required. These findings contrast with the widely held view that speaking involves a canonical vocal tract shape target for each phoneme.National Institute on Deafness and Other Communication Disorders (1R29-DC02852-02, 5R01-DC01925-04, 1R03-C2576-0l); National Science Foundation (IRI-9310518

    Open challenges in understanding development and evolution of speech forms: The roles of embodied self-organization, motivation and active exploration

    Get PDF
    This article discusses open scientific challenges for understanding development and evolution of speech forms, as a commentary to Moulin-Frier et al. (Moulin-Frier et al., 2015). Based on the analysis of mathematical models of the origins of speech forms, with a focus on their assumptions , we study the fundamental question of how speech can be formed out of non--speech, at both developmental and evolutionary scales. In particular, we emphasize the importance of embodied self-organization , as well as the role of mechanisms of motivation and active curiosity-driven exploration in speech formation. Finally , we discuss an evolutionary-developmental perspective of the origins of speech

    Training a Vocal Tract Synthesiser to imitate speech using Distal Supervised Learning

    Get PDF
    Imitation is a powerful mechanism by which both animals and people can learn useful behavior, by copying the actions of others. We adopt this approach as a means to control an articulatory speech synthesizer. The goal of our project is to build a system that can learn to mimic speech using its own vocal tract. We approach this task by training an inverse mapping between the synthesizer’s control parameters and their auditory consequences. In this paper we compare the direct estimation of this inverse model with the distal supervised learning scheme proposed by Jordan & Rumelhart (1992). Both of these approaches involve a babbling phase, which is used to learn the auditory consequences of the articulatory controls. We show that both schemes perform well on speech generated by the synthesizer itself, when no normalization is needed, but that distal learning provided slightly better performance with speech generated by a real human subject

    KLAIR: A virtual infant for spoken language acquisition research

    Get PDF
    Recent research into the acquisition of spoken language has stressed the importance of learning through embodied linguistic interaction with caregivers rather than through passive observation. However the necessity of interaction makes experimental work into the simulation of infant speech acquisition difficult because of the technical complexity of building real-time embodied systems. In this paper we present KLAIR: a software toolkit for building simulations of spoken language acquisition through interactions with a virtual infant. The main part of KLAIR is a sensori-motor server that supplies a client machine learning application with a virtual infant on screen that can see, hear and speak. By encapsulating the real-time complexities of audio and video processing within a server that will run on a modern PC, we hope that KLAIR will encourage and facilitate more experimental research into spoken language acquisition through interaction. Copyright © 2009 ISCA

    Effect of Visual Input on Vowel Production in English Speakers

    Get PDF
    This study analyzes whether there should be a visual component to a model of speech perception and production by comparing the jaw opening, advancement, and rounding of American English and non-English vowels in the presence and absence of a visual stimulus. Surprisingly, jaw opening did not change production, but the presence of the visual stimulus was found to be a significant factor in participants’ vowel advancement for non-English vowels. This may be explained by lip rounding, but requires further research in order to develop a full understanding of the impact of visual input on vowel production to be used in teaching and learning languages

    Pre-Low Raising in Japanese Pitch Accent

    Get PDF
    Japanese has been observed to have 2 versions of the H tone, the higher of which is associated with an accented mora. However, the distinction of these 2 versions only surfaces in context but not in isolation, leading to a long-standing debate over whether there is 1 H tone or 2. This article reports evidence that the higher version may result from a pre-low raising mechanism rather than being inherently higher. The evidence is based on an analysis of F0 of words that varied in length, accent condition and syllable structure, produced by native speakers of Japanese at 2 speech rates. The data indicate a clear separation between effects that are due to mora-level preplanning and those that are mechanical. These results are discussed in terms of mechanisms of laryngeal control during tone production, and highlight the importance of articulation as a link between phonology and surface acoustics.postprin

    Error Detection and Correction During Object Naming in Individuals with Aphasia

    Get PDF
    Aphasia is a neurogenic communication disorder that occurs following a left hemisphere stroke and commonly co-occurs with apraxia of speech (AOS). Individuals with aphasia typically make errors in their lexical retrieval and have difficulties detecting and correcting them. While there is ample research in how errors occur, few researchers go as far as to look at error detection and subsequent correction in this population. Given this need for research, we took a pre-existing data set of 23 individuals with aphasia grouped for presence of AOS (nine with comorbid AOS) and coded their spoken responses on the Object Naming subtest of the Western Aphasia Battery-Revised to characterize the types of error made, as well as whether those errors were detected and corrected. Groups did not differ for total number of errors; however, participants with AOS produced more late-stage errors than the participants without AOS, meaning they made errors that occurred after the level of lemma selection (i.e., phonemic paraphasias and neologisms). In this sample, people with aphasia were generally able to detect their errors, though the presence of AOS impacted their ability to correct
    • …
    corecore