32 research outputs found

    Neural Modeling and Imaging of the Cortical Interactions Underlying Syllable Production

    Full text link
    This paper describes a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimaging data concerning the control of speech movements. The model is a neural network whose components correspond to regions of the cerebral cortex and cerebellum, including premotor, motor, auditory, and somatosensory cortical areas. Computer simulations of the model verify its ability to account for compensation to lip and jaw perturbations during speech. Specific anatomical locations of the model's components are estimated, and these estimates are used to simulate fMRI experiments of simple syllable production with and without jaw perturbations.National Institute on Deafness and Other Communication Disorders (R01 DC02852, RO1 DC01925

    A review of data collection practices using electromagnetic articulography

    Get PDF
    This paper reviews data collection practices in electromagnetic articulography (EMA) studies, with a focus on sensor placement. It consists of three parts: in the first part, we introduce electromagnetic articulography as a method. In the second part, we focus on existing data collection practices. Our overview is based on a literature review of 905 publications from a large variety of journals and conferences, identified through a systematic keyword search in Google Scholar. The review shows that experimental designs vary greatly, which in turn may limit researchers' ability to compare results across studies. In the third part of this paper we describe an EMA data collection procedure which includes an articulatory-driven strategy for determining where to position sensors on the tongue without causing discomfort to the participant. We also evaluate three approaches for preparing (NDI Wave) EMA sensors reported in the literature with respect to the duration the sensors remain attached to the tongue: 1) attaching out-of-the-box sensors, 2) attaching sensors coated in latex, and 3) attaching sensors coated in latex with an additional latex flap. Results indicate no clear general effect of sensor preparation type on adhesion duration. A subsequent exploratory analysis reveals that sensors with the additional flap tend to adhere for shorter times than the other two types, but that this pattern is inverted for the most posterior tongue sensor

    MEASURING PRE-SPEECH ARTICULATION

    Get PDF
    Abstract: What do speakers do when they start to talk? This thesis concentrates on the articulatory aspects of this problem, and offers methodological and experimental results on tongue movement, captured using Ultrasound Tongue Imaging (UTI). Speech initiation occurs at the start of every utterance. An understanding of the timing relationship between articulatory initiation (which occurs first) and acoustic initiation (that is, the start of audible speech) has implications for speech production theories, the methodological design and interpretation of speech production experiments, and clinical studies of speech production. Two novel automated techniques for detecting articulatory onsets in UTI data were developed based on Euclidean distance. The methods are verified against manually annotated data. The latter technique is based on a novel way of identifying the region of the tongue that is first to initiate movement. Data from three speech production experiments are analysed in this thesis. The first experiment is picture naming recorded with UTI and is used to explore behavioural variation at the beginning of an utterance, and to test and develop analysis tools for articulatory data. The second experiment also uses UTI recordings, but it is specifically designed to exclude any pre-speech movements of the articulators which are not directly related to the linguistic content of the utterance itself (that is, which are not expected to be present in every full repetition of the utterance), in order to study undisturbed speech initiation. The materials systematically varied the phonetic onsets of the monosyllabic target words, and the vowel nucleus. They also provided an acoustic measure of the duration of the syllable rhyme. Statistical models analysed the timing relationships of articulatory onset, and acoustic durations of the sound segments, and the acoustic duration of the rhyme. Finally, to test a discrepancy between the results of the second UTI experiment and findings in the literature, based on data recorded with Electromagnetic Articulography (EMA), a third experiment measured a single speaker using both methods and matched materials. Using the global Pixel Difference and Scanline-based Pixel Difference analysis methods developed and verified in the first half of the thesis, the main experimental findings were as follows. First, pre-utterance silent articulation is timed in inverse correlation with the acoustic duration of the onset consonant and in positive correlation with the acoustic rhyme of the first word. Because of the latter correlation, it should be considered part of the first word. Second, comparison of UTI and EMA failed to replicate the discrepancy. Instead, EMA was found to produce longer reaction times independent of utterance type.Keywords: Speech initiation, pre-speech articulation, delayed naming, ultrasound tongue imaging, electromagnetic articulography, automated methods

    Online control of articulation based on auditory feedback in normal Speech and stuttering : behavioral and modeling studies

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Program in Health Sciences and Technology, February 2012."February, 2012." Cataloged from PDF version of thesis.Includes bibliographical references (p. 185-209).Articulation of multisyllabic speech requires a high degree of accuracy in controlling the spatial (positional) and the temporal parameters of articulatory movements. In stuttering, a disorder of speech fluency, failures to meet these control requirements occur frequently, leading to dysfluencies such as sound repetitions and prolongations. Currently, little is known about the sensorimotor mechanisms underlying the control of multisyllabic articulation and how they break down in stuttering. This dissertation is focused on the interaction between multisyllabic articulation and auditory feedback (AF), the perception of one's own speech sounds during speech production, which has been shown previously to play important roles in quasi-static articulations as well as in the mechanisms of stuttering. To investigate this topic empirically, we developed a digital signal processing platform for introducing flexible online perturbations of time-varying formants in speakers' AF during speech production. This platform was in a series of perturbation experiments, in which we aimed separately at elucidating the role of AF in controlling the spatial and temporal parameters of multisyllabic articulation. Under these perturbations of AF, normal subjects showed small but significant and specific online adjustments in the spatial and temporal parameters of articulation, which provided first evidence for a role of AF in the online fine-tuning of articulatory trajectories. To model and explain these findings, we designed and tested sqDIVA, a computational model for the sensory feedback-based control of speech movement timing. Test results indicated that this new model accurately accounted for the spatiotemporal compensation patterns observed in the perturbation experiments. In addition, we investigated empirically how the AF-based online speech motor control differed between people who stutter (PWS) and normal speakers. The PWS group showed compensatory responses significantly smaller in magnitude and slower in onset compared to the control subjects' responses. This under-compensation to AF perturbation was observed for both quasi-static vowels and multisyllabic speech, and for both the spatial and temporal control of articulation. This abnormal sensorimotor performance supports the hypothesis that stuttering involves deficits in the rapid internal transformations between the auditory and motor domains, with important implications for the neural basis of this disorder.by Shanqing Cai.Ph.D

    Vowel nasalization in German

    Get PDF

    Fundamental frequency modelling: an articulatory perspective with target approximation and deep learning

    Get PDF
    Current statistical parametric speech synthesis (SPSS) approaches typically aim at state/frame-level acoustic modelling, which leads to a problem of frame-by-frame independence. Besides that, whichever learning technique is used, hidden Markov model (HMM), deep neural network (DNN) or recurrent neural network (RNN), the fundamental idea is to set up a direct mapping from linguistic to acoustic features. Although progress is frequently reported, this idea is questionable in terms of biological plausibility. This thesis aims at addressing the above issues by integrating dynamic mechanisms of human speech production as a core component of F0 generation and thus developing a more human-like F0 modelling paradigm. By introducing an articulatory F0 generation model – target approximation (TA) – between text and speech that controls syllable-synchronised F0 generation, contextual F0 variations are processed in two separate yet integrated stages: linguistic to motor, and motor to acoustic. With the goal of demonstrating that human speech movement can be considered as a dynamic process of target approximation and that the TA model is a valid F0 generation model to be used at the motor-to-acoustic stage, a TA-based pitch control experiment is conducted first to simulate the subtle human behaviour of online compensation for pitch-shifted auditory feedback. Then, the TA parameters are collectively controlled by linguistic features via a deep or recurrent neural network (DNN/RNN) at the linguistic-to-motor stage. We trained the systems on a Mandarin Chinese dataset consisting of both statements and questions. The TA-based systems generally outperformed the baseline systems in both objective and subjective evaluations. Furthermore, the amount of required linguistic features were reduced first to syllable level only (with DNN) and then with all positional information removed (with RNN). Fewer linguistic features as input with limited number of TA parameters as output led to less training data and lower model complexity, which in turn led to more efficient training and faster synthesis

    Ressonância magnética no estudo da produção do português europeu

    Get PDF
    Mestrado em Ciências da Fala e da AudiçãoA Ressonância Magnética (RM) é um método de imagem extremamente valioso na prática clínica e tem vindo a ser utilizado, nos últimos anos, em estudos de produção de fala. O facto de não utilizar radiações ionizantes, sendo considerado um método de aquisição relativamente inócuo, em conjunto com a sua capacidade multiplanar, boa resolução a nível de tecidos moles e possibilidade de modulação 3D, faz da RM um dos métodos mais promissores para obtenção de dados na área das Ciências da Fala. Existe já um vasto número de estudos com RM, para diversas línguas, mas este tipo de informação é ainda muito escassa para o Português Europeu. Este estudo com RM teve como principais objectivos a obtenção de uma base de dados relativa aos diversos sons do PE, mediante a aquisição de imagens relativas a produções estáticas (2D e 3D) e produções dinâmicas recorrendo a aquisição de imagem em tempo real. Foi também objectivo deste trabalho a validação de um método de aquisição de imagem (aquisição 3D). As imagens obtidas no corpus 2D permitiram obter as configurações do tracto vocal no plano sagital para grande parte dos sons do PE, incluindo todos os sons nasais, com uma resolução espacial e relação SNR que permitiu observar com nitidez a maior parte dos articuladores, extrair contornos e parâmetros articulatórios, e observar efeitos de coarticulação em oclusivas e fricativas. Da aquisição 3D, foram obtidas funções de área, parâmetros quantitativos como a abertura nasal, coeficiente de abertura do velo (CAV) e área da cavidade faríngea para vogais nasais, orais e consoantes nasais do PE. As imagens em tempo real, obtidas com uma velocidade de 5 frames/s, permitiram obter informação preliminar relativa à dinâmica dos articuladores durante a produção de fala. Os dados obtidos com este trabalho permitiram também o desenvolvimento de ferramentas de segmentação semi-automáticas fundamentais para a extracção de informação das imagens RM. ABSTRACT: Magnetic Resonance Imaging is a powerful diagnostic tool and has been used successfully to acquire information in speech production studies. Because it does not use ionising radiation, being considered a safe imaging technique, together with its multiplanar capability, good contrast resolution of soft tissues as well as the possibility of 3D modelling, makes MRI one of the most promising imaging methods in the area of speech research. There are several MRI speech production studies, for different languages, but there is not a systematic study for European Portuguese using MRI. The main goal of this study was to acquire a MRI database relative to European Portuguese sounds. This database included images relative to static productions (2D and 3D) as well as images obtained from dynamic productions during real time acquisition. 2D images allowed to get the configurations of the vocal tract in the mediosagittal plan for a vast part of the EP sounds, including all nasal sounds, with a SNR and spatial resolution that allowed (1) to observe with clearness most of the articulators, (2) to extract contours and articulatory parameters, and (3) to observe coarticulatory effects in stops and fricatives. From 3D acquisition, area functions were obtained together with some quantitative parameters such as nasal opening, Velum Port Opening Quotient (VPOQ) and pharyngeal areas, both for nasal and oral vowels and nasal consonants of EP. Real time images, obtained with a frame rate of five frames per second, allowed to get some (preliminary) information on the dynamics of the articulators (mainly tongue movements) during speech production. The database, obtained with this work, allowed the development of semiautomatic tools of segmentation for the extraction of information from MR images

    The role of linguistic contrasts in the auditory feedback control of Speech

    Get PDF
    Thesis (Ph. D. in Speech and Hearing Bioscience and Technology)--Harvard-MIT Division of Health Sciences and Technology, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 165-180).Speakers use auditory feedback to monitor their own speech, ensuring that the intended output matches the observed output. By altering the acoustic feedback signal before it reaches the speaker's ear, we can induce auditory errors: differences between what is expected and what is heard. This dissertation investigates the neural mechanisms responsible for the detection and consequent correction of these auditory errors. Linguistic influences on feedback control were assessed in two experiments employing auditory perturbation. In a behavioral experiment, subjects spoke four-word sentences while the fundamental frequency (FO) of the stressed word was perturbed either upwards or downwards, causing the word to sound more or less stressed. Subjects adapted by altering both the FO and the intensity contrast between stressed and unstressed words, even though intensity remained unperturbed. An integrated model of prosodic control is proposed in which FO and intensity are modulated together to achieve a stress target. In a second experiment, functional magnetic resonance imaging was used to measure neural responses to speech with and without auditory perturbation. Subjects were found to compensate more for formant shifts that resulted in a phonetic category change than for formant shifts that did not, despite the identical magnitudes of the shifts. Furthermore, the extent of neural activation in superior temporal and inferior frontal regions was greater for cross-category than for within-category shifts, evidence that a stronger cortical error signal accompanies a linguistically-relevant acoustic change. Taken together, these results demonstrate that auditory feedback control is sensitive to linguistic contrasts learned through auditory experience.by Caroline A. Niziolek.Ph.D.in Speech and Hearing Bioscience and Technolog

    Pediatric Responses to Fundamental and Formant Frequency Altered Auditory Feedback: A Scoping Review

    Get PDF
    Purpose: The ability to hear ourselves speak has been shown to play an important role in the development and maintenance of fluent and coherent speech. Despite this, little is known about the developing speech motor control system throughout childhood, in particular if and how vocal and articulatory control may differ throughout development. A scoping review was undertaken to identify and describe the full range of studies investigating responses to frequency altered auditory feedback in pediatric populations and their contributions to our understanding of the development of auditory feedback control and sensorimotor learning in childhood and adolescence. Method: Relevant studies were identified through a comprehensive search strategy of six academic databases for studies that included (a) real-time perturbation of frequency in auditory input, (b) an analysis of immediate effects on speech, and (c) participants aged 18 years or younger. Results: Twenty-three articles met inclusion criteria. Across studies, there was a wide variety of designs, outcomes and measures used. Manipulations included fundamental frequency (9 studies), formant frequency (12), frequency centroid of fricatives (1), and both fundamental and formant frequencies (1). Study designs included contrasts across childhood, between children and adults, and between typical, pediatric clinical and adult populations. Measures primarily explored acoustic properties of speech responses (latency, magnitude, and variability). Some studies additionally examined the association of these acoustic responses with clinical measures (e.g., stuttering severity and reading ability), and neural measures using electrophysiology and magnetic resonance imaging. Conclusion: Findings indicated that children above 4 years generally compensated in the opposite direction of the manipulation, however, in several cases not as effectively as adults. Overall, results varied greatly due to the broad range of manipulations and designs used, making generalization challenging. Differences found between age groups in the features of the compensatory vocal responses, latency of responses, vocal variability and perceptual abilities, suggest that maturational changes may be occurring in the speech motor control system, affecting the extent to which auditory feedback is used to modify internal sensorimotor representations. Varied findings suggest vocal control develops prior to articulatory control. Future studies with multiple outcome measures, manipulations, and more expansive age ranges are needed to elucidate findings
    corecore