5,189 research outputs found
Tracking Articulator Movements Using Orientation Measurements
This paper introduces a new method to track articulator movements, specifically jaw position and angle, using 5 degree of freedom (5 DOF) orientation data. The approach uses a quaternion rotation method to accomplish this jaw tracking during speech using a single senor on the mandibular incisor. Data were collected using the NDI Wave Speech Research System for one pilot subject with various speech tasks. The degree of jaw rotation from the proposed approach is compared with traditional geometric calculation. Results show that the quaternion based method is able to describe jaw angle trajectory and gives more accurate and smooth estimation of jaw kinematics
Vowel Production in Mandarin Accented English and American English: Kinematic and Acoustic Data from the Marquette University Mandarin Accented English Corpus
Few electromagnetic articulography (EMA) datasets are publicly available, and none have focused systematically on non-native accented speech. We introduce a kinematic-acoustic database of speech from 40 (gender and dialect balanced) participants producing upper-Midwestern American English (AE) L1 or Mandarin Accented English (MAE) L2 (Beijing or Shanghai dialect base). The Marquette University EMA-MAE corpus will be released publicly to help advance research in areas such as pronunciation modeling, acoustic-articulatory inversion, L1-L2 comparisons, pronunciation error detection, and accent modification training. EMA data were collected at a 400 Hz sampling rate with synchronous audio using the NDI Wave System. Articulatory sensors were placed on the midsagittal lips, lower incisors, and tongue blade and dorsum, as well as on the lip corner and lateral tongue body. Sensors provide five degree-of-freedom measurements including three-dimensional sensor position and two-dimensional orientation (pitch and roll). In the current work we analyze kinematic and acoustic variability between L1 and L2 vowels. We address the hypothesis that MAE is characterized by larger differences in the articulation of back vowels than front vowels and smaller vowel spaces compared to AE. The current results provide a seminal comparison of the kinematics and acoustics of vowel production between MAE and AE speakers
An information theoretic approach to the functional classification of neurons
A population of neurons typically exhibits a broad diversity of responses to
sensory inputs. The intuitive notion of functional classification is that cells
can be clustered so that most of the diversity is captured in the identity of
the clusters rather than by individuals within clusters. We show how this
intuition can be made precise using information theory, without any need to
introduce a metric on the space of stimuli or responses. Applied to the retinal
ganglion cells of the salamander, this approach recovers classical results, but
also provides clear evidence for subclasses beyond those identified previously.
Further, we find that each of the ganglion cells is functionally unique, and
that even within the same subclass only a few spikes are needed to reliably
distinguish between cells.Comment: 13 pages, 4 figures. To appear in Advances in Neural Information
Processing Systems (NIPS) 1
Sensorimotor Adaptation of Speech Using Real-time Articulatory Resynthesis
Sensorimotor adaptation is an important focus in the study of motor learning for non-disordered speech, but has yet to be studied substantially for speech rehabilitation. Speech adaptation is typically elicited experimentally using LPC resynthesis to modify the sounds that a speaker hears himself producing. This method requires that the participant be able to produce a robust speech-acoustic signal and is therefore not well-suited for talkers with dysarthria. We have developed a novel technique using electromagnetic articulography (EMA) to drive an articulatory synthesizer. The acoustic output of the articulatory synthesizer can be perturbed experimentally to study auditory feedback effects on sensorimotor learning. This work aims to compare sensorimotor adaptation effects using our articulatory resynthesis method with effects from an established, acoustic-only method. Results suggest that the articulatory resynthesis method can elicit speech adaptation, but that the articulatory effects of the two methods differ
Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion
Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data
The Electromagnetic Articulography Mandarin Accented English (EMA-MAE) Corpus of Acoustic and 3D Articulatory Kinematic Data
There is a significant need for more comprehensive electromagnetic articulography (EMA) datasets that can provide matched acoustics and articulatory kinematic data with good spatial and temporal resolution. The Marquette University Electromagnetic Articulography Mandarin Accented English (EMA-MAE) corpus provides kinematic and acoustic data from 40 gender and dialect balanced speakers representing 20 Midwestern standard American English L1 speakers and 20 Mandarin Accented English (MAE) L2 speakers, half Beijing region dialect and half are Shanghai region dialect. Three dimensional EMA data were collected at a 400 Hz sampling rate using the NDI Wave system, with articulatory sensors on the midsagittal lips, lower incisors, tongue blade and dorsum, plus lateral lip corner and tongue body. Sensors provide three-dimensional position data as well as two-dimensional orientation data representing the orientation of the sensor plane. Data have been corrected for head movement relative to a fixed reference sensor and also adjusted using a biteplate calibration system to place the data in an articulatory working space relative to each subject\u27s individual midsagittal and maxillary occlusal planes. Speech materials include isolated words chosen to focus on specific contrasts between the English and Mandarin languages, as well as sentences and paragraphs for continuous speech, totaling approximately 45 minutes of data per subject. A beta version of the EMA-MAE corpus is now available, and the full corpus is in preparation for public release to help advance research in areas such as pronunciation modeling, acoustic-articulatory inversion, L1-L2 comparisons, pronunciation error detection, and accent modification training
Palate-referenced Articulatory Features for Acoustic-to-Articulator Inversion
The selection of effective articulatory features is an important component of tasks such as acoustic-to-articulator inversion and articulatory synthesis. Although it is common to use direct articulatory sensor measurements as feature variables, this approach fails to incorporate important physiological information such as palate height and shape and thus is not as representative of vocal tract cross section as desired. We introduce a set of articulator feature variables that are palate referenced and normalized with respect to the articulatory working space in order to improve the quality of the vocal tract representation. These features include normalized horizontal positions plus the normalized palatal height of two midsagittal and one lateral tongue sensor, as well as normalized lip separation and lip protrusion. The quality of the feature representation is evaluated subjectively by comparing the variances and vowel separation in the working space and quantitatively through measurement of acoustic-to-articulator inversion error. Results indicate that the palate-referenced features have reduced variance and increased separation between vowels spaces and substantially lower inversion error than direct sensor measures
- …