3,257 research outputs found
Vowel Production in Mandarin Accented English and American English: Kinematic and Acoustic Data from the Marquette University Mandarin Accented English Corpus
Few electromagnetic articulography (EMA) datasets are publicly available, and none have focused systematically on non-native accented speech. We introduce a kinematic-acoustic database of speech from 40 (gender and dialect balanced) participants producing upper-Midwestern American English (AE) L1 or Mandarin Accented English (MAE) L2 (Beijing or Shanghai dialect base). The Marquette University EMA-MAE corpus will be released publicly to help advance research in areas such as pronunciation modeling, acoustic-articulatory inversion, L1-L2 comparisons, pronunciation error detection, and accent modification training. EMA data were collected at a 400 Hz sampling rate with synchronous audio using the NDI Wave System. Articulatory sensors were placed on the midsagittal lips, lower incisors, and tongue blade and dorsum, as well as on the lip corner and lateral tongue body. Sensors provide five degree-of-freedom measurements including three-dimensional sensor position and two-dimensional orientation (pitch and roll). In the current work we analyze kinematic and acoustic variability between L1 and L2 vowels. We address the hypothesis that MAE is characterized by larger differences in the articulation of back vowels than front vowels and smaller vowel spaces compared to AE. The current results provide a seminal comparison of the kinematics and acoustics of vowel production between MAE and AE speakers
Feedforward and feedback control in apraxia of speech: effects of noise masking on vowel production
PURPOSE: This study was designed to test two hypotheses about apraxia of speech (AOS) derived from the Directions Into Velocities of Articulators (DIVA) model (Guenther et al., 2006): the feedforward system deficit hypothesis and the feedback system deficit hypothesis. METHOD: The authors used noise masking to minimize auditory feedback during speech. Six speakers with AOS and aphasia, 4 with aphasia without AOS, and 2 groups of speakers without impairment (younger and older adults) participated. Acoustic measures of vowel contrast, variability, and duration were analyzed. RESULTS: Younger, but not older, speakers without impairment showed significantly reduced vowel contrast with noise masking. Relative to older controls, the AOS group showed longer vowel durations overall (regardless of masking condition) and a greater reduction in vowel contrast under masking conditions. There were no significant differences in variability. Three of the 6 speakers with AOS demonstrated the group pattern. Speakers with aphasia without AOS did not differ from controls in contrast, duration, or variability. CONCLUSION: The greater reduction in vowel contrast with masking noise for the AOS group is consistent with the feedforward system deficit hypothesis but not with the feedback system deficit hypothesis; however, effects were small and not present in all individual speakers with AOS. Theoretical implications and alternative interpretations of these findings are discussed.R01 DC002852 - NIDCD NIH HHS; R01 DC007683 - NIDCD NIH HH
Recommended from our members
Distributed video coding in wireless multimedia sensor network for multimedia broadcasting
Recently the development of Distributed Video Coding (DVC) has provided the promising theory
support to realize the infrastructure of Wireless Multimedia Sensor Network (WMSN), which composed of autonomous hardware for capturing and transmission of quality audio-visual content. The implementation of DVC in WMSN can better solve the problem of energy constraint of the sensor nodes due to the benefit of lower computational encoder in DVC. In this paper, a practical DVC scheme, pixel-domain Wyner-Ziv(PDWZ) video
coding, with slice structure and adaptive rate selection(ARS) is proposed to solve the certain problems when applying DVC into WMSN. Firstly, the proposed slice structure in PDWZ has extended the feasibility of PDWZ to work with any interleaver size used in Slepian-wolf turbo codec for heterogeneous applications. Meanwhile,
based on the slice structure, an adaptive code rate selection has been proposed aiming at reduce the system delay occurred in feedback request. The simulation results clearly showed the enhancement in R-D performance and perceptual quality. It also can be observed that system delay caused by frequent feedback is greatly reduced, which gives a promising support for WMSN with low latency and facilitates the QoS management
HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders
This paper introduces an end-to-end neural speech restoration model,
HD-DEMUCS, demonstrating efficacy across multiple distortion environments.
Unlike conventional approaches that employ cascading frameworks to remove
undesirable noise first and then restore missing signal components, our model
performs these tasks in parallel using two heterogeneous decoder networks.
Based on the U-Net style encoder-decoder framework, we attach an additional
decoder so that each decoder network performs noise suppression or restoration
separately. We carefully design each decoder architecture to operate
appropriately depending on its objectives. Additionally, we improve performance
by leveraging a learnable weighting factor, aggregating the two decoder output
waveforms. Experimental results with objective metrics across various
environments clearly demonstrate the effectiveness of our approach over a
single decoder or multi-stage systems for general speech restoration task.Comment: Accepted by INTERSPEECH 202
- …