Search CORE

1,683 research outputs found

Recommended from our members

Cortical encoding and decoding models of speech production

Author: Chartier Josh
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

To speak is to dynamically orchestrate the movements of the articulators (jaw, tongue, lips, and larynx), which in turn generate speech sounds. It is an amazing mental and motor feat that is controlled by the brain and is fundamental for communication. Technology that could translate brain signals into speech would be transformative for people who are unable to communicate as a result of neurological impairments. This work first investigates how articulator movements that underlie natural speech production are represented in the brain. Building upon this, this work also presents a neural decoder that can synthesize audible speech from brain signals. Data to support these results were from direct cortical recordings of the human sensorimotor cortex while participants spoke natural sentences. Neural activity at individual electrodes encoded a diversity of articulatory kinematic trajectories (AKTs), each revealing coordinated articulator movements towards specific vocal tract shapes. The neural decoder was designed to leverage the kinematic trajectories encoded in the sensorimotor cortex which enhanced performance even with limited data. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication

eScholarship - University of California

Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

Author: Berry Jeffrey J.
Ji An
Johnson Michael T.
Publication venue: e-Publications@Marquette
Publication date: 01/10/2016
Field of study

Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data

epublications@Marquette

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

Directory of Open Access Books (DOAB)

Speaker Independent Acoustic-to-Articulatory Inversion

Author: Ji An
Publication venue: e-Publications@Marquette
Publication date: 01/10/2014
Field of study

Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

epublications@Marquette

Articulatory Kinematics During Stop Closure in Speakers with Parkinson’s Disease

Author: Thompson Austin Ross
Publication venue: LSU Digital Commons
Publication date: 02/04/2018
Field of study

Purpose: The goal of this exploratory study was (a) to investigate the differences in articulatory movements during the closure phase of bilabial stop consonants with respect to distance, displacement, and timing of motion between individuals with Parkinson’s Disease (PD) and healthy controls; and (b) to investigate changes in articulatory movements of speakers with PD when they voluntarily vary the degree of speech intelligibility. Methods: Six participants, 4 PD and 2 healthy control (HC) speakers, participated in this study. The stimulus was a sentence containing several bilabial stop consonants (i.e., “Buy Bobby a puppy”). Movement data were collected using the Wave Speech Research System (NDI, Canada). Movement measures included duration, distance, and displacement and speed of the tongue front, tongue back, upper lip, lower lip, and jaw. Results: Speakers with PD and HC speakers produced observable articulatory differences during the stop closure of bilabial stops. Generally, the PD group produced smaller articulatory movement and had longer closure durations than the HC group. Regarding changes in speaking mode, the two groups made observable, but different articulatory changes during the stop closure. For more clear speech, both groups made greater articulatory movements and decreased the stop closure duration. For less clear speech, the HC group demonstrated reduced articulatory movements and longer closure durations whereas the PD group made greater articulatory movements and longer closure durations. Discussion: The findings of this study revealed several articulatory differences during the stop closure between the two speaking groups. For more clear speaking conditions, speakers in the PD group can successfully compensate for reduced articulatory movement by producing exaggerated lower lip and jaw movement. These findings support the use of more clear speaking modifications as a therapeutic technique to elicit better articulatory movement among speakers with PD. However, it also appears the PD group has difficulty producing fine motor articulatory changes (e.g., less clear speech)

Louisiana State University

Evidence for active control of tongue lateralization in Australian English /l/

Author: Best CT
Carignan C
Derrick D
Proctor M
Shaw JA
Ying J
Publication venue
Publication date: 01/05/2021
Field of study

Research on the temporal dynamics of /l/ production has focused primarily on mid-sagittal tongue movements. This study reports how known variations in the timing of mid-sagittal gestures are related to para-sagittal dynamics in /l/ formation in Australian English (AusE), using three-dimensional electromagnetic articulography (3D EMA). The articulatory analyses show (1) consistent with past work, the temporal lag between tongue tip and tongue body gestures identified in the mid-sagittal plane changes across different syllable positions and vowel contexts; (2) the lateral channel is largely formed by tilting the tongue to the left/right side of the oral cavity as opposed to curving the tongue within the coronal plane; and, (3) the timing of lateral channel formation relative to the tongue body gesture is consistent across syllable positions and vowel contexts, even as the temporal lag between tongue tip and tongue body gestures varies. This last result is particularly informative with respect to theoretical hypotheses regarding gestural control for /l/s, as it suggests that lateral channel formation is actively controlled as opposed to resulting as a passive consequence of tongue stretching. These results are interpreted as evidence that the formation of the lateral channel is a primary articulatory goal of /l/ production in AusE

UCL Discovery

ANALYSIS OF VOCAL FOLD KINEMATICS USING HIGH SPEED VIDEO

Author: Unnikrishnan Harikrishnan
Publication venue: UKnowledge
Publication date: 01/01/2016
Field of study

Vocal folds are the twin in-folding of the mucous membrane stretched horizontally across the larynx. They vibrate modulating the constant air flow initiated from the lungs. The pulsating pressure wave blowing through the glottis is thus the source for voiced speech production. Study of vocal fold dynamics during voicing are critical for the treatment of voice pathologies. Since the vocal folds move at 100 - 350 cycles per second, their visual inspection is currently done by strobosocopy which merges information from multiple cycles to present an apparent motion. High Speed Digital Laryngeal Imaging(HSDLI) with a temporal resolution of up to 10,000 frames per second has been established as better suited for assessing the vocal fold vibratory function through direct recording. But the widespread use of HSDLI is limited due to lack of consensus on the modalities like features to be examined. Development of the image processing techniques which circumvents the need for the tedious and time consuming effort of examining large volumes of recording has room for improvement. Fundamental questions like the required frame rate or resolution for the recordings is still not adequately answered. HSDLI cannot get the absolute physical measurement of the anatomical features and vocal fold displacement. This work addresses these challenges through improved signal processing. A vocal fold edge extraction technique with subpixel accuracy, suited even for hard to record pediatric population is developed first. The algorithm which is equally applicable for pediatric and adult subjects, is implemented to facilitate user inspection and intervention. Objective features describing the fold dynamics, which are extracted from the edge displacement waveform are proposed and analyzed on a diverse dataset of healthy males, females and children. The sampling and quantization noise present in the recordings are analyzed and methods to mitigate them are investigated. A customized Kalman smoothing and spline interpolation on the displacement waveform is found to improve the feature estimation stability. The relationship between frame rate, spatial resolution and vibration for efficient capturing of information is derived. Finally, to address the inability to measure physical measurement, a structured light projection calibrated with respect to the endoscope is prototyped

University of Kentucky

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

Directory of Open Access Books (DOAB)

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis

Directory of Open Access Books (DOAB)

Subthalamic Nucleus and Sensorimotor Cortex Activity During Speech Production

Author: Bush Alan
Chrabaszcz Anna
Crammond Donald J.
Dastolfo Hromack Christina A.
Dickey Michael W.
Fiez Julie A.
Holt Lori L.
Lipski Witold J.
Neumann Wolf Julian
Richardson R. Mark
Shaiman Susan
Stretcu Otilia
Turner Robert S.
Wang Dengyu
Publication venue: 'Society for Neuroscience'
Publication date: 01/04/2019
Field of study

The sensorimotor cortex is somatotopically organized to represent the vocal tract articulators such as lips, tongue, larynx, and jaw. How speech and articulatory features are encoded at the subcortical level, however, remains largely unknown. We analyzed LFP recordings from the subthalamic nucleus (STN) and simultaneous electrocorticography recordings from the sensorimotor cortex of 11 human subjects (1 female) with Parkinson´s disease during implantation of deep-brain stimulation (DBS) electrodes while they read aloud three-phoneme words. The initial phonemes involved either articulation primarily with the tongue (coronal consonants) or the lips (labial consonants). We observed significant increases in high-gamma (60?150 Hz) power in both the STN and the sensorimotor cortex that began before speech onset and persisted for the duration of speech articulation. As expected from previous reports, in the sensorimotor cortex, the primary articulators involved in the production of the initial consonants were topographically represented by high-gamma activity. We found that STN high-gamma activity also demonstrated specificity for the primary articulator, although no clear topography was observed. In general, subthalamic high-gamma activity varied along the ventral?dorsal trajectory of the electrodes, with greater high-gamma power recorded in the dorsal locations of the STN. Interestingly, the majority of significant articulator-discriminative activity in the STN occurred before that in sensorimotor cortex. These results demonstrate that articulator-specific speech information is contained within high-gamma activity of the STN, but with different spatial and temporal organization compared with similar information encoded in the sensorimotor cortex.Fil: Chrabaszcz, Anna. University of Pittsburgh; Estados UnidosFil: Neumann, Wolf Julian. Universität zu Berlin; AlemaniaFil: Stretcu, Otilia. University of Pittsburgh; Estados UnidosFil: Lipski, Witold J.. University of Pittsburgh; Estados UnidosFil: Dastolfo Hromack, Christina A.. University of Pittsburgh; Estados UnidosFil: Bush, Alan. University of Pittsburgh; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; ArgentinaFil: Wang, Dengyu. Tsinghua University; China. University of Pittsburgh; Estados UnidosFil: Crammond, Donald J.. University of Pittsburgh; Estados UnidosFil: Shaiman, Susan. University of Pittsburgh; Estados UnidosFil: Dickey, Michael W.. University of Pittsburgh; Estados UnidosFil: Holt, Lori L.. University of Pittsburgh; Estados UnidosFil: Turner, Robert S.. University of Pittsburgh; Estados UnidosFil: Fiez, Julie A.. University of Pittsburgh; Estados UnidosFil: Richardson, R. Mark. University of Pittsburgh; Estados Unido

CONICET Digital