553 research outputs found

    The Electromagnetic Articulography Mandarin Accented English (EMA-MAE) Corpus of Acoustic and 3D Articulatory Kinematic Data

    Get PDF
    There is a significant need for more comprehensive electromagnetic articulography (EMA) datasets that can provide matched acoustics and articulatory kinematic data with good spatial and temporal resolution. The Marquette University Electromagnetic Articulography Mandarin Accented English (EMA-MAE) corpus provides kinematic and acoustic data from 40 gender and dialect balanced speakers representing 20 Midwestern standard American English L1 speakers and 20 Mandarin Accented English (MAE) L2 speakers, half Beijing region dialect and half are Shanghai region dialect. Three dimensional EMA data were collected at a 400 Hz sampling rate using the NDI Wave system, with articulatory sensors on the midsagittal lips, lower incisors, tongue blade and dorsum, plus lateral lip corner and tongue body. Sensors provide three-dimensional position data as well as two-dimensional orientation data representing the orientation of the sensor plane. Data have been corrected for head movement relative to a fixed reference sensor and also adjusted using a biteplate calibration system to place the data in an articulatory working space relative to each subject\u27s individual midsagittal and maxillary occlusal planes. Speech materials include isolated words chosen to focus on specific contrasts between the English and Mandarin languages, as well as sentences and paragraphs for continuous speech, totaling approximately 45 minutes of data per subject. A beta version of the EMA-MAE corpus is now available, and the full corpus is in preparation for public release to help advance research in areas such as pronunciation modeling, acoustic-articulatory inversion, L1-L2 comparisons, pronunciation error detection, and accent modification training

    Coupling between the laryngeal and supralaryngeal systems

    Get PDF
    Includes bibliographical references (p. 27-30)."A dissertation submitted in partial fulfillment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2010."Thesis (B.Sc)--University of Hong Kong, 2010.The present study investigated the coupling between the laryngeal and supralaryngeal systems in speech production. The interrelationship between the two systems was examined by studying the possible interaction between tone production (laryngeal system) and articulation (supralaryngeal system). Sixty (30 male and 30 female) native Cantonese speakers participated in the study. The first and second formant frequencies (F1 and F2) associated with the four vowels /i, u, ?, ?/ produced at six Cantonese lexical tones (highlevel, high-rising, mid-level, low-falling, low-rising and low-level tones) were obtained. Results revealed that, regardless of vowels, significant articulatory changes were found when produced at different tones. However, the difference pattern across each vowel was not systematic. Gender difference was also noted; male and female speakers showed different patterns in articulatory changes. These findings revealed the coupling effect between the laryngeal and supra-laryngeal systems.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Quantification of vocal tract configuration of laryngectomees by acoustic reflection technology (ART)

    Get PDF
    This study compared the vocal tract configuration, including the length and volume, of alaryngeal and laryngeal speakers. Thirty alaryngeal speakers and 30 laryngeal speakers were recruited for the study. Pharyngometry, which is an acoustic reflection technology (ART), was used to measure the vocal tract parameters of the participants. Results showed that there was no significant difference in the length and volume of the vocal tract of the alaryngeal and laryngeal speakers. The finding suggested that the difference in the formant frequency during vowel production by alaryngeal and laryngeal speakers may be due to factors other than vocal tract configuration. The finding also suggested that the independence of the source and the filter (Fant, 1960; Pickett, 1999) may not be applicable to alaryngeal speakers.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Speaker Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

    The articulatory and acoustic characteristics of Polish sibilants and their consequences for diachronic change

    Get PDF
    The study is concerned with the relative synchronic stability of three contrastive sibilant fricatives /s (sic)/ in Polish. Tongue movement data were collected from nine first-language Polish speakers producing symmetrical real and non-word CVCV sequences in three vowel contexts. A Gaussian model was used to classify the sibilants from spectral information in the noise and from formant frequencies at vowel onset. The physiological analysis showed an almost complete separation between /s (sic)/ on tongue-tip parameters. The acoustic analysis showed that the greater energy at higher frequencies distinguished /s/ in the fricative noise from the other two sibilant categories. The most salient information at vowel onset was for /(sic)/, which also had a strong palatalizing effect on the following vowel. Whereas either the noise or vowel onset was largely sufficient for the identification of /s (sic)/ respectively, both sets of cues were necessary to separate /(sic)/ from /s (sic)/. The greater synchronic instability of /(sic)/ may derive from its high articulatory complexity coupled with its comparatively low acoustic salience. The data also suggest that the relatively late stage of /(sic)/ acquisition by children may come about because of the weak acoustic information in the vowel for its distinction from /s/

    A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

    Full text link
    Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 subjects performing linguistically motivated speech tasks, alongside the corresponding first-ever public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat
    corecore