282 research outputs found

    Intraspeaker Comparisons of Acoustic and Articulatory Variability in American English /r/ Productions

    Full text link
    The purpose of this report is to test the hypothesis that speakers utilize an acoustic, rather than articulatory, planning space for speech production. It has been well-documented that many speakers of American English use different tongue configurations to produce /r/ in different phonetic contexts. The acoustic planning hypothesis suggests that although the /r/ configuration varies widely in different contexts, the primary acoustic cue for /r/, a dip in the F3 trajectory, will be less variable due to tradeoffs in articulatory variability, or trading relations, that help maintain a relatively constant F3 trajectory across phonetic contexts. Acoustic data and EMMA articulatory data from seven speakers producing /r/ in different phonetic contexts were analyzed. Visual inspection of the EMMA data at the point of F3 minimum revealed that each speaker appeared to use at least two of three trading relation strategies that would be expected to reduce F3 variability. Articulatory covariance measures confirmed that all seven speakers utilized a trading relation between tongue back height and tongue back horizontal position, six speakers utilized a trading relation between tongue tip height and tongue back height, and the speaker who did not use this latter strategy instead utilized a trading relation between tongue tip height and tongue back horizontal position. Estimates of F3 variability with and without the articulatory covariances indicated that F3 would be much higher for all speakers if the articulatory covariances were not utilized. These conclusions were further supported by a comparison of measured F3 variability to F3 variabilities estimated from the pellet data with and without articulatory covariances. In all subjects, the actual F3 variance was significantly lower than the F3 variance estimated without articulatory covariances, further supporting the conclusion that the articulatory trading relations were being used to reduce F3 variability. Together, these results strongly suggest that the neural control mechanisms underlying speech production make elegant use of trading relations between articulators to maintain a relatively invariant acoustic trace for /r/ across phonetic contexts

    Analyzing liquids

    Get PDF

    Reading aloud begins when the computation of phonology is complete.

    Get PDF

    Intraspeaker Comparisons of Acoustic and Articulatory Variability in American English /r/ Productions

    Full text link
    The purpose of this report is to test the hypothesis that speakers utilize an acoustic, rather than articulatory, planning space for speech production. It has been well-documented that many speakers of American English use different tongue configurations to produce /r/ in different phonetic contexts. The acoustic planning hypothesis suggests that although the /r/ configuration varies widely in different contexts, the primary acoustic cue for /r/, a dip in the F3 trajectory, will be less variable due to tradeoffs in articulatory variability, or trading relations, that help maintain a relatively constant F3 trajectory across phonetic contexts. Acoustic data and EMMA articulatory data from seven speakers producing /r/ in different phonetic contexts were analyzed. Visual inspection of the EMMA data at the point of F3 minimum revealed that each speaker appeared to use at least two of three trading relation strategies that would be expected to reduce F3 variability. Articulatory covariance measures confirmed that all seven speakers utilized a trading relation between tongue back height and tongue back horizontal position, six speakers utilized a trading relation between tongue tip height and tongue back height, and the speaker who did not use this latter strategy instead utilized a trading relation between tongue tip height and tongue back horizontal position. Estimates of F3 variability with and without the articulatory covariances indicated that F3 would be much higher for all speakers if the articulatory covariances were not utilized. These conclusions were further supported by a comparison of measured F3 variability to F3 variabilities estimated from the pellet data with and without articulatory covariances. In all subjects, the actual F3 variance was significantly lower than the F3 variance estimated without articulatory covariances, further supporting the conclusion that the articulatory trading relations were being used to reduce F3 variability. Together, these results strongly suggest that the neural control mechanisms underlying speech production make elegant use of trading relations between articulators to maintain a relatively invariant acoustic trace for /r/ across phonetic contexts

    Back from the future:Nonlinear anticipation in adults and children's speech

    Get PDF
    Purpose: This study examines the temporal organization of vocalic anticipation in German children from 3 to 7 years of age and adults. The main objective was to test for non-linearprocesses in vocalic anticipation, which may result from the interaction between lingualgestural goalsfor individual vowels, and those for their neighbors over time. Method: The technique of ultrasound imaging was employed to record tongue movement at fivetimepoints throughout short utterances of the form V1#CV2. Vocalic anticipation was examined with Generalized Additive Modeling, an analytical approach allowing forthe estimation of both linear and non-linearinfluences on anticipatoryprocesses. Results: both adults and children exhibit non-linear patterns of vocalic anticipation over time with the degree and extent of vocalic anticipation varying as a function of the individual consonants and vowels assembled. However, noticeable developmental discrepancieswere found with vocalic anticipation being present earlier in children ́sutterances at 3-4-5 years of agein comparison to adults and to some extent 7-year-old children.Conclusions: Anarrowing of speech production organization from large chunks in kindergarten to more contextually-specified organizationsseems to occur fromkindergarten toprimary school toadulthood, although variation in the temporal overlap of lingual gestures for consecutive segments is already present in the youngestcohorts. In adults, non-linear anticipatory patterns over time suggest a strong differentiation between the gestural goals for consecutive segments. In children, this differentiation is not yet mature: vowelsshow greater prominence over time and seem activated more in-phase with those of previous segments relative to adults

    Vowel Production in Mandarin Accented English and American English: Kinematic and Acoustic Data from the Marquette University Mandarin Accented English Corpus

    Get PDF
    Few electromagnetic articulography (EMA) datasets are publicly available, and none have focused systematically on non-native accented speech. We introduce a kinematic-acoustic database of speech from 40 (gender and dialect balanced) participants producing upper-Midwestern American English (AE) L1 or Mandarin Accented English (MAE) L2 (Beijing or Shanghai dialect base). The Marquette University EMA-MAE corpus will be released publicly to help advance research in areas such as pronunciation modeling, acoustic-articulatory inversion, L1-L2 comparisons, pronunciation error detection, and accent modification training. EMA data were collected at a 400 Hz sampling rate with synchronous audio using the NDI Wave System. Articulatory sensors were placed on the midsagittal lips, lower incisors, and tongue blade and dorsum, as well as on the lip corner and lateral tongue body. Sensors provide five degree-of-freedom measurements including three-dimensional sensor position and two-dimensional orientation (pitch and roll). In the current work we analyze kinematic and acoustic variability between L1 and L2 vowels. We address the hypothesis that MAE is characterized by larger differences in the articulation of back vowels than front vowels and smaller vowel spaces compared to AE. The current results provide a seminal comparison of the kinematics and acoustics of vowel production between MAE and AE speakers

    Vowel nasalization in German

    Get PDF

    Articulatory Tradeoffs Reduce Acoustic Variability During American English /r/ Production

    Full text link
    Acoustic and articulatory recordings reveal that speakers utilize systematic articulatory tradeoffs to maintain acoustic stability when producing the phoneme /r/. Distinct articulator configurations used to produce /r/ in various phonetic contexts show systematic tradeoffs between the cross-sectional areas of different vocal tract sections. Analysis of acoustic and articulatory variabilities reveals that these tradeoffs act to reduce acoustic variability, thus allowing large contextual variations in vocal tract shape; these contextual variations in turn apparently reduce the amount of articulatory movement required. These findings contrast with the widely held view that speaking involves a canonical vocal tract shape target for each phoneme.National Institute on Deafness and Other Communication Disorders (1R29-DC02852-02, 5R01-DC01925-04, 1R03-C2576-0l); National Science Foundation (IRI-9310518

    ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION

    Get PDF
    Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as `beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as `coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system

    The interaction between articulation and tones in Cantonese

    Get PDF
    "A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2009."Thesis (B.Sc)--University of Hong Kong, 2009.Includes bibliographical references (p. 27-30).published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science
    corecore