1,296 research outputs found
Recommended from our members
A split-gesture, competitive, coupled oscillator model of syllable structure predicts the emergence of edge gemination and degemination
The phonological mechanisms responsible for the emergence of edge geminates in phonological processes like the Italian Raddoppiamento (Fono-)Sintattico (RS) are an open issue. Previous analyses of Italian treat gemination of (i) word initial consonants, (ii) morpheme-final consonants, and (iii) word final consonants as separate processes brought about by dedicated rule/constraints. We argue that these edge gemination processes result from the same, independently established principles. Through computational simulation of the split-gesture, competitive, coupled oscillator model of syllable structure of Articulatory Phonology, we show that increases in closure duration typical of geminates arise from changes to consonant/vowel couplings. Word initial gemination follows from coupling of a closure gesture to a preceding vowel across a word boundary. Word final gemination follows from coupling of a release gesture to a following vowel. In both cases, the posited structures reflect changes in syllabification hypothesized in previous work. The model simulation also predict different durations for resyllabified edge geminates and medial lexical geminates, in line with experimental findings on the topic. Changes to consonant/vowel couplings also account for the opposite effect: word initial degemination. Thus, the coupled oscillator model of Articulatory Phonology, originally developed to model intergestural timing, predicts the emergence of edge gemination/degemination
Categoriality and continuity in prosodic prominence
Prosody has been characterised as a "half-tamed savage" being shaped by both discrete, categorical aspects as well as gradient, continuous phenomena. This book is concerned with the relation of the "wild" and the "tamed" sides of prosodic prominence. It reviews problems that arise from a strict separation of categorical and continuous representations in models of phonetics and phonology, and it explores the potential role of descriptions aimed at reconciling the two domains. In doing so, the book offers an introduction to dynamical systems, a framework that has been studied extensively in the last decades to model speech production and perception. The reported acoustic and articulatory data presented in this book show that categorical and continuous modulations used to enhance prosodic prominence are deeply intertwined and even exhibit a kind of symbiosis. A multi-dimensional dynamical model of prosodic prominence is sketched, based on the empirical data, combining tonal and articulatory aspects of prosodic focus marking. The model demonstrates how categorical and continuous aspects can be inte- grated in a joint theoretical treatment that overcomes a strict separation of phonetics and phonology
Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion
Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data
Neural Modeling and Imaging of the Cortical Interactions Underlying Syllable Production
This paper describes a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimaging data concerning the control of speech movements. The model is a neural network whose components correspond to regions of the cerebral cortex and cerebellum, including premotor, motor, auditory, and somatosensory cortical areas. Computer simulations of the model verify its ability to account for compensation to lip and jaw perturbations during speech. Specific anatomical locations of the model's components are estimated, and these estimates are used to simulate fMRI experiments of simple syllable production with and without jaw perturbations.National Institute on Deafness and Other Communication Disorders (R01 DC02852, RO1 DC01925
A silent speech system based on permanent magnet articulography and direct synthesis
In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies
Computational and Robotic Models of Early Language Development: A Review
We review computational and robotics models of early language learning and
development. We first explain why and how these models are used to understand
better how children learn language. We argue that they provide concrete
theories of language learning as a complex dynamic system, complementing
traditional methods in psychology and linguistics. We review different modeling
formalisms, grounded in techniques from machine learning and artificial
intelligence such as Bayesian and neural network approaches. We then discuss
their role in understanding several key mechanisms of language development:
cross-situational statistical learning, embodiment, situated social
interaction, intrinsically motivated learning, and cultural evolution. We
conclude by discussing future challenges for research, including modeling of
large-scale empirical data about language acquisition in real-world
environments.
Keywords: Early language learning, Computational and robotic models, machine
learning, development, embodiment, social interaction, intrinsic motivation,
self-organization, dynamical systems, complexity.Comment: to appear in International Handbook on Language Development, ed. J.
Horst and J. von Koss Torkildsen, Routledg
Cortical Dynamics of Language
The human capability for fluent speech profoundly directs inter-personal communication and, by extension, self-expression. Language is lost in millions of people each year due to trauma, stroke, neurodegeneration, and neoplasms with devastating impact to social interaction and quality of life. The following investigations were designed to elucidate the neurobiological foundation of speech production, building towards a universal cognitive model of language in the brain. Understanding the dynamical mechanisms supporting cortical network behavior will significantly advance the understanding of how both focal and disconnection injuries yield neurological deficits, informing the development of therapeutic approaches
Learning to Produce Speech with an Altered Vocal Tract: The Role of Auditory Feedback
Modifying the vocal tract alters a speaker’s previously learned acoustic–articulatory relationship. This study investigated the contribution of auditory feedback to the process of adapting to vocal-tract modifications. Subjects said the word /tɑs/ while wearing a dental prosthesis that extended the length of their maxillary incisor teeth. The prosthesis affected /s/ productions and the subjects were asked to learn to produce ‘‘normal’’ /s/’s. They alternately received normal auditory feedback and noise that masked their natural feedback during productions. Acoustic analysis of the speakers’ /s/ productions showed that the distribution of energy across the spectra moved toward that of normal, unperturbed production with increased experience with the prosthesis. However, the acoustic analysis did not show any significant differences in learning dependent on auditory feedback. By contrast, when naive listeners were asked to rate the quality of the speakers’ utterances, productions made when auditory feedback was available were evaluated to be closer to the subjects’ normal productions than when feedback was masked. The perceptual analysis showed that speakers were able to use auditory information to partially compensate for the vocal-tract modification. Furthermore, utterances produced during the masked conditions also improved over a session, demonstrating that the compensatory articulations were learned and available after auditory feedback was removed
- …