1,683 research outputs found

    Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Speaker Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

    Articulatory Kinematics During Stop Closure in Speakers with Parkinson’s Disease

    Get PDF
    Purpose: The goal of this exploratory study was (a) to investigate the differences in articulatory movements during the closure phase of bilabial stop consonants with respect to distance, displacement, and timing of motion between individuals with Parkinson’s Disease (PD) and healthy controls; and (b) to investigate changes in articulatory movements of speakers with PD when they voluntarily vary the degree of speech intelligibility. Methods: Six participants, 4 PD and 2 healthy control (HC) speakers, participated in this study. The stimulus was a sentence containing several bilabial stop consonants (i.e., “Buy Bobby a puppy”). Movement data were collected using the Wave Speech Research System (NDI, Canada). Movement measures included duration, distance, and displacement and speed of the tongue front, tongue back, upper lip, lower lip, and jaw. Results: Speakers with PD and HC speakers produced observable articulatory differences during the stop closure of bilabial stops. Generally, the PD group produced smaller articulatory movement and had longer closure durations than the HC group. Regarding changes in speaking mode, the two groups made observable, but different articulatory changes during the stop closure. For more clear speech, both groups made greater articulatory movements and decreased the stop closure duration. For less clear speech, the HC group demonstrated reduced articulatory movements and longer closure durations whereas the PD group made greater articulatory movements and longer closure durations. Discussion: The findings of this study revealed several articulatory differences during the stop closure between the two speaking groups. For more clear speaking conditions, speakers in the PD group can successfully compensate for reduced articulatory movement by producing exaggerated lower lip and jaw movement. These findings support the use of more clear speaking modifications as a therapeutic technique to elicit better articulatory movement among speakers with PD. However, it also appears the PD group has difficulty producing fine motor articulatory changes (e.g., less clear speech)

    Evidence for active control of tongue lateralization in Australian English /l/

    Get PDF
    Research on the temporal dynamics of /l/ production has focused primarily on mid-sagittal tongue movements. This study reports how known variations in the timing of mid-sagittal gestures are related to para-sagittal dynamics in /l/ formation in Australian English (AusE), using three-dimensional electromagnetic articulography (3D EMA). The articulatory analyses show (1) consistent with past work, the temporal lag between tongue tip and tongue body gestures identified in the mid-sagittal plane changes across different syllable positions and vowel contexts; (2) the lateral channel is largely formed by tilting the tongue to the left/right side of the oral cavity as opposed to curving the tongue within the coronal plane; and, (3) the timing of lateral channel formation relative to the tongue body gesture is consistent across syllable positions and vowel contexts, even as the temporal lag between tongue tip and tongue body gestures varies. This last result is particularly informative with respect to theoretical hypotheses regarding gestural control for /l/s, as it suggests that lateral channel formation is actively controlled as opposed to resulting as a passive consequence of tongue stretching. These results are interpreted as evidence that the formation of the lateral channel is a primary articulatory goal of /l/ production in AusE

    ANALYSIS OF VOCAL FOLD KINEMATICS USING HIGH SPEED VIDEO

    Get PDF
    Vocal folds are the twin in-folding of the mucous membrane stretched horizontally across the larynx. They vibrate modulating the constant air flow initiated from the lungs. The pulsating pressure wave blowing through the glottis is thus the source for voiced speech production. Study of vocal fold dynamics during voicing are critical for the treatment of voice pathologies. Since the vocal folds move at 100 - 350 cycles per second, their visual inspection is currently done by strobosocopy which merges information from multiple cycles to present an apparent motion. High Speed Digital Laryngeal Imaging(HSDLI) with a temporal resolution of up to 10,000 frames per second has been established as better suited for assessing the vocal fold vibratory function through direct recording. But the widespread use of HSDLI is limited due to lack of consensus on the modalities like features to be examined. Development of the image processing techniques which circumvents the need for the tedious and time consuming effort of examining large volumes of recording has room for improvement. Fundamental questions like the required frame rate or resolution for the recordings is still not adequately answered. HSDLI cannot get the absolute physical measurement of the anatomical features and vocal fold displacement. This work addresses these challenges through improved signal processing. A vocal fold edge extraction technique with subpixel accuracy, suited even for hard to record pediatric population is developed first. The algorithm which is equally applicable for pediatric and adult subjects, is implemented to facilitate user inspection and intervention. Objective features describing the fold dynamics, which are extracted from the edge displacement waveform are proposed and analyzed on a diverse dataset of healthy males, females and children. The sampling and quantization noise present in the recordings are analyzed and methods to mitigate them are investigated. A customized Kalman smoothing and spline interpolation on the displacement waveform is found to improve the feature estimation stability. The relationship between frame rate, spatial resolution and vibration for efficient capturing of information is derived. Finally, to address the inability to measure physical measurement, a structured light projection calibrated with respect to the endoscope is prototyped

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis

    Subthalamic Nucleus and Sensorimotor Cortex Activity During Speech Production

    Get PDF
    The sensorimotor cortex is somatotopically organized to represent the vocal tract articulators such as lips, tongue, larynx, and jaw. How speech and articulatory features are encoded at the subcortical level, however, remains largely unknown. We analyzed LFP recordings from the subthalamic nucleus (STN) and simultaneous electrocorticography recordings from the sensorimotor cortex of 11 human subjects (1 female) with Parkinson´s disease during implantation of deep-brain stimulation (DBS) electrodes while they read aloud three-phoneme words. The initial phonemes involved either articulation primarily with the tongue (coronal consonants) or the lips (labial consonants). We observed significant increases in high-gamma (60?150 Hz) power in both the STN and the sensorimotor cortex that began before speech onset and persisted for the duration of speech articulation. As expected from previous reports, in the sensorimotor cortex, the primary articulators involved in the production of the initial consonants were topographically represented by high-gamma activity. We found that STN high-gamma activity also demonstrated specificity for the primary articulator, although no clear topography was observed. In general, subthalamic high-gamma activity varied along the ventral?dorsal trajectory of the electrodes, with greater high-gamma power recorded in the dorsal locations of the STN. Interestingly, the majority of significant articulator-discriminative activity in the STN occurred before that in sensorimotor cortex. These results demonstrate that articulator-specific speech information is contained within high-gamma activity of the STN, but with different spatial and temporal organization compared with similar information encoded in the sensorimotor cortex.Fil: Chrabaszcz, Anna. University of Pittsburgh; Estados UnidosFil: Neumann, Wolf Julian. Universität zu Berlin; AlemaniaFil: Stretcu, Otilia. University of Pittsburgh; Estados UnidosFil: Lipski, Witold J.. University of Pittsburgh; Estados UnidosFil: Dastolfo Hromack, Christina A.. University of Pittsburgh; Estados UnidosFil: Bush, Alan. University of Pittsburgh; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; ArgentinaFil: Wang, Dengyu. Tsinghua University; China. University of Pittsburgh; Estados UnidosFil: Crammond, Donald J.. University of Pittsburgh; Estados UnidosFil: Shaiman, Susan. University of Pittsburgh; Estados UnidosFil: Dickey, Michael W.. University of Pittsburgh; Estados UnidosFil: Holt, Lori L.. University of Pittsburgh; Estados UnidosFil: Turner, Robert S.. University of Pittsburgh; Estados UnidosFil: Fiez, Julie A.. University of Pittsburgh; Estados UnidosFil: Richardson, R. Mark. University of Pittsburgh; Estados Unido
    • …
    corecore