55 research outputs found
Acoustic Space Movement Planning in a Neural Model of Motor Equivalent Vowel Production
Recent evidence suggests that speakers utilize an acoustic-like reference frame for the planning of speech movements. DIVA, a computational model of speech acquisition and motor equivalent speech production, has previously been shown to provide explanations for a wide range of speech production data using a constriction-based reference frame for movement planning. This paper extends the previous work by investigating an acoustic-like planning frame in the DIVA modeling framework. During a babbling phase, the model self-organizes targets in the planning space for each of ten vowels and learns a mapping from desired movement directions in this planning space into appropriate articulator velocities. Simulation results verify that after babbling the model is capable of producing easily recognizable vowel sounds using an acoustic planning space consisting of the formants F1 and F2. The model successfully reaches all vowel targets from any initial vocal tract configuration, even in the presence of constraints such as a blocked jaw.Office of Naval Research (N00014-91-J-4100, N00014-92-J-4015); Air Force Office of Scientific Research (F49620-92-J-0499
Articulatory Tradeoffs Reduce Acoustic Variability During American English /r/ Production
Acoustic and articulatory recordings reveal that speakers utilize systematic articulatory tradeoffs to maintain acoustic stability when producing the phoneme /r/. Distinct articulator configurations used to produce /r/ in various phonetic contexts show systematic tradeoffs between the cross-sectional areas of different vocal tract sections. Analysis of acoustic and articulatory variabilities reveals that these tradeoffs act to reduce acoustic variability, thus allowing large contextual variations in vocal tract shape; these contextual variations in turn apparently reduce the amount of articulatory movement required. These findings contrast with the widely held view that speaking involves a canonical vocal tract shape target for each phoneme.National Institute on Deafness and Other Communication Disorders (1R29-DC02852-02, 5R01-DC01925-04, 1R03-C2576-0l); National Science Foundation (IRI-9310518
Open challenges in understanding development and evolution of speech forms: The roles of embodied self-organization, motivation and active exploration
This article discusses open scientific challenges for understanding
development and evolution of speech forms, as a commentary to Moulin-Frier et
al. (Moulin-Frier et al., 2015). Based on the analysis of mathematical models
of the origins of speech forms, with a focus on their assumptions , we study
the fundamental question of how speech can be formed out of non--speech, at
both developmental and evolutionary scales. In particular, we emphasize the
importance of embodied self-organization , as well as the role of mechanisms of
motivation and active curiosity-driven exploration in speech formation. Finally
, we discuss an evolutionary-developmental perspective of the origins of
speech
Training a Vocal Tract Synthesiser to imitate speech using Distal Supervised Learning
Imitation is a powerful mechanism by which both animals and people can learn useful behavior, by copying the actions of others. We adopt this approach as a means to control an articulatory speech synthesizer. The goal of our project is to build a system that can learn to mimic speech using its own vocal tract. We approach this task by training an inverse mapping between the synthesizer’s control parameters and their auditory consequences. In this paper we compare the direct estimation of this inverse model with the distal supervised learning scheme proposed by Jordan & Rumelhart (1992). Both of these approaches involve a babbling phase, which is used to learn the auditory consequences of the articulatory controls. We show that both schemes perform well on speech generated by the synthesizer itself, when no normalization is needed, but that distal learning provided slightly better performance with speech generated by a real human subject
Recommended from our members
Long-term and persistent vocal plasticity in adult bats.
Bats exhibit a diverse and complex vocabulary of social communication calls some of which are believed to be learned during development. This ability to produce learned, species-specific vocalizations - a rare trait in the animal kingdom - requires a high-degree of vocal plasticity. Bats live extremely long lives in highly complex and dynamic social environments, which suggests that they might also retain a high degree of vocal plasticity in adulthood, much as humans do. Here, we report persistent vocal plasticity in adult bats (Rousettus aegyptiacus) following exposure to broad-band, acoustic perturbation. Our results show that adult bats can not only modify distinct parameters of their vocalizations, but that these changes persist even after noise cessation - in some cases lasting several weeks or months. Combined, these findings underscore the potential importance of bats as a model organism for studies of vocal plasticity, including in adulthood
KLAIR: A virtual infant for spoken language acquisition research
Recent research into the acquisition of spoken language has stressed the importance of learning through embodied linguistic interaction with caregivers rather than through passive observation. However the necessity of interaction makes experimental work into the simulation of infant speech acquisition difficult because of the technical complexity of building real-time embodied systems. In this paper we present KLAIR: a software toolkit for building simulations of spoken language acquisition through interactions with a virtual infant. The main part of KLAIR is a sensori-motor server that supplies a client machine learning application with a virtual infant on screen that can see, hear and speak. By encapsulating the real-time complexities of audio and video processing within a server that will run on a modern PC, we hope that KLAIR will encourage and facilitate more experimental research into spoken language acquisition through interaction. Copyright © 2009 ISCA
Effect of Visual Input on Vowel Production in English Speakers
This study analyzes whether there should be a visual component to a model of speech perception and production by comparing the jaw opening, advancement, and rounding of American English and non-English vowels in the presence and absence of a visual stimulus. Surprisingly, jaw opening did not change production, but the presence of the visual stimulus was found to be a significant factor in participants’ vowel advancement for non-English vowels. This may be explained by lip rounding, but requires further research in order to develop a full understanding of the impact of visual input on vowel production to be used in teaching and learning languages
Pre-Low Raising in Japanese Pitch Accent
Japanese has been observed to have 2 versions of the H tone, the higher of which is associated with an accented mora. However, the distinction of these 2 versions only surfaces in context but not in isolation, leading to a long-standing debate over whether there is 1 H tone or 2. This article reports evidence that the higher version may result from a pre-low raising mechanism rather than being inherently higher. The evidence is based on an analysis of F0 of words that varied in length, accent condition and syllable structure, produced by native speakers of Japanese at 2 speech rates. The data indicate a clear separation between effects that are due to mora-level preplanning and those that are mechanical. These results are discussed in terms of mechanisms of laryngeal control during tone production, and highlight the importance of articulation as a link between phonology and surface acoustics.postprin
Error Detection and Correction During Object Naming in Individuals with Aphasia
Aphasia is a neurogenic communication disorder that occurs following a left hemisphere stroke and commonly co-occurs with apraxia of speech (AOS). Individuals with aphasia typically make errors in their lexical retrieval and have difficulties detecting and correcting them. While there is ample research in how errors occur, few researchers go as far as to look at error detection and subsequent correction in this population. Given this need for research, we took a pre-existing data set of 23 individuals with aphasia grouped for presence of AOS (nine with comorbid AOS) and coded their spoken responses on the Object Naming subtest of the Western Aphasia Battery-Revised to characterize the types of error made, as well as whether those errors were detected and corrected. Groups did not differ for total number of errors; however, participants with AOS produced more late-stage errors than the participants without AOS, meaning they made errors that occurred after the level of lemma selection (i.e., phonemic paraphasias and neologisms). In this sample, people with aphasia were generally able to detect their errors, though the presence of AOS impacted their ability to correct
- …