327 research outputs found

    Model-based exploration of linking between vowel articulatory space and acoustic space

    Get PDF
    While the acoustic vowel space has been extensively studied in previous research, little is known about the high-dimensional articulatory space of vowels. The articulatory imaging techniques are limited to tracking only a few key articulators, leaving the rest of the articulators unmonitored. In the present study, we attempted to develop a detailed articulatory space obtained by training a 3D articulatory synthesizer to learn eleven British English vowels. An analysis-by-synthesis strategy was used to acoustically optimize vocal tract parameters that represent twenty articulatory dimensions. The results show that tongue height and retraction, larynx location and lip roundness are the most perceptually distinctive articulatory dimensions. Yet, even for these dimensions, there is a fair amount of articulatory overlap between vowels, unlike the fine-grained acoustic space. This method opens up the possibility of using modelling to investigate the link between speech production and perception

    The relation between acoustic and articulatory variation in vowels : data from American and Australian English

    Get PDF
    In studies of dialect variation, the articulatory nature of vowels is sometimes inferred from formant values using the following heuristic: F1 is inversely correlated with tongue height and F2 is inversely correlated with tongue backness. This study compared vowel formants and corresponding lingual articulation in two dialects of English, standard North American English and Australian English. Five speakers of North American English and four speakers of Australian English were recorded producing multiple repetitions of ten monophthongs embedded in the /sVd/ context. Simultaneous articulatory data were collected using electromagnetic articulography. Results show that there are significant correlations between tongue position and formants in the direction predicted by the heuristic but also that the relations implied by the heuristic break down under specific conditions. Articulatory vowel spaces, based on tongue dorsum (TD) position, and acoustic vowel spaces, based on formants, show systematic misalignment due in part to the influence of other articulatory factors, including lip rounding and tongue curvature on formant values. Incorporating these dimensions into our dialect comparison yields a richer description and a more robust understanding of how vowel formant patterns are reproduced within and across dialects

    Beyond the average: embracing speaker individuality in the dynamic modeling of the acoustic-articulatory relationship

    Get PDF
    This paper explores the acoustic-articulatory relationship while considering individual differences in speech production. We aimed to determine whether there is a causal relationship between tongue movements and the contours of the first and second formant frequencies (F1 and F2) employing a hierarchical Bayesian continuous-time dynamic model, which allows for a more direct connection between the acoustic and articulatory measured variables and theories involving dynamicity. The results show predictive tendencies for both formants, where the anteroposterior and vertical tongue movements may predict changes in F1, with rising predicting an increase and retraction a decrease; and with tongue fronting and tongue height inversely predicting F2. Further, the modeled individual differences showed similar global tendencies, except for the rate of change of F2. Overall, this study provides valuable insights into the relationship between tongue articulatory variables and formant contours, while accounting for between-speaker variability

    Modeling of oropharyngeal articulatory adaptation to compensate for the acoustic effects of nasalization

    Get PDF
    Hypernasality is one of the most detrimental speech disturbances that lead to declines of speech intelligibility. Velopharyngeal inadequacy, which is associated with anatomic defects such as cleft palate or neuromuscular disorders that affect velopharygneal function, is the primary cause of hypernasality. A simulation study by Rong and Kuehn [J. Speech Lang. Hear. Res. 55(5), 1438–1448 (2012)] demonstrated that properly adjusted oropharyngeal articulation can reduce nasality for vowels synthesized with an articulatory model [Mermelstein, J. Acoust. Soc. Am. 53(4), 1070–1082 (1973)]. In this study, a speaker-adaptive articulatory model was developed to simulate speaker-customized oropharyngeal articulatory adaptation to compensate for the acoustic effects of nasalization on /a/, /i/, and /u/. The results demonstrated that (1) the oropharyngeal articulatory adaptation effectively counteracted the effects of nasalization on the second lowest formant frequency (F2) and partially compensated for the effects of nasalization on vowel space (e.g., shifting and constriction of vowel space) and (2) the articulatory adaptation strategies generated by the speaker-adaptive model might be more efficacious for counteracting the acoustic effects of nasalization compared to the adaptation strategies generated by the standard articulatory model in Rong and Kuehn. The findings of this study indicated the potential of using oropharyngeal articulatory adaptation as a means to correct maladaptive articulatory behaviors and to reduce nasalit

    Let the agents do the talking: On the influence of vocal tract anatomy no speech during ontogeny

    Get PDF

    Speaker-Specific Adaptation of Maeda Synthesis Parameters for Auditory Feedback

    Get PDF
    The Real-time Articulatory Speech Synthesizer (RASS) is a research tool in the Marquette Speech and Swallowing lab that simultaneously collects acoustic and articulatory data from human participants. The system is used to study acoustic-to-articulatory inversion, articulatory-to-acoustic synthesis mapping, and the effects of real-time acoustic feedback. Electromagnetic Articulography (EMA) is utilized to collect position data via sensors placed in a subject’s mouth. These kinematic data are then converted into a set of synthesis parameters that controls an articulatory speech synthesizer, which in turn generates an acoustic waveform matching the associated kinematics. Independently from RASS, the synthesized acoustic waveform can be further modified before it is returned to the subject, creating the opportunity for involuntary learning through controlled acoustic feedback. In order to maximize the impact of involuntary learning, the characteristics of the synthetically generated speech need to closely match those of the participant. There are a number of synthesis parameters that cannot be directly controlled by subjects’ articulatory movements such as fundamental frequency and parameters corresponding to physiological measures such as vocal tract length and overall vocal tract size. The goal of this work is to develop a mechanism for automatically determining RASS internal synthesis parameters that provide the closest synthesis parameter match to a subject’s acoustic characteristics, ultimately increasing the system’s positive effect on involuntary learning.. The methods detailed in this thesis examine the effects of altering both time-independent and time-dependent synthesis parameters to increase the acoustic similarity between subjects’ real and synthesized speech. The fundamental frequency and first two formant values are studied in particular across multiple vowels to determine the time-independent parameter settings. Time-dependent parameter analysis is performed through the use of a real-time parameter-tracking configuration. Results of this work provide a way of adapting the Maeda synthesis parameters in RASS to be speaker-specific and individualize the study of auditory feedback. This investigation will allow researchers to better customize the RASS system for individual subjects and alter involuntary learning outcomes

    Speech Communication

    Get PDF
    Contains table of contents for Part IV, table of contents for Section 1, an introduction, reports on seven research projects and a list of publications.C.J. Lebel FellowshipDennis Klatt Memorial FundNational Institutes of Health Grant T32-DC00005National Institutes of Health Grant R01-DC00075National Institutes of Health Grant F32-DC00015National Institutes of Health Grant R01-DC00266National Institutes of Health Grant P01-DC00361National Institutes of Health Grant R01-DC00776National Science Foundation Grant IRI 89-10561National Science Foundation Grant IRI 88-05680National Science Foundation Grant INT 90-2471

    Articulation in time : Some word-initial segments in Swedish

    Get PDF
    Speech is both dynamic and distinctive at the same time. This implies a certain contradiction which has entertained researchers in phonetics and phonology for decades. The present dissertation assumes that articulation behaves as a function of time, and that we can find phonological structures in the dynamical systems. EMA is used to measure mechanical movements in Swedish speakers. The results show that tonal context affects articulatory coordination. Acceleration seems to divide the movements of the jaw and lips into intervals of postures and active movements. These intervals are affected differently by the tonal context. Furthermore, a bilabial consonant is shorter if the next consonant is also made with the lips. A hypothesis of a correlation between acoustic segment duration and acceleration is presented. The dissertation highlights the importance of time for how speech ultimately sounds. Particularly significant is the combination of articulatory timing and articulatory duration

    Lexical Stress Realization in Mandarin Second Language Learners of English: An Acoustic and Articulatory Study

    Full text link
    This dissertation investigated the acoustic and articulatory correlates of lexical stress in Mandarin second language (L2) learners of English, as well as in first language (L1) speakers. The present study used a minimal pair respective to stress location (e.g., OBject versus obJECT) obtained from a publicly available Mandarin Accented English Electromagnetic articulography corpus dataset. In the acoustic domain, the use of acoustic parameters (duration, intensity, F0, and vowel quality) was measured in stressed and unstressed vowels. In the articulatory domain, the positional information from tongue tip (TT), tongue dorsum (TD), upper lip (UL), lower lip (LL), and jaw (JAW) were retrieved from the concurrent vowel data. Finally, the acoustic and articulatory correlation was computed and compared both within and across groups. The acoustic analysis demonstrated that L2 speakers significantly differentiated the stressed vowels from the unstressed vowels using all suprasegmental cues, while vowel quality was extremely limitedly used in the L2 group. In the articulatory analysis, Mandarin L2 speakers demonstrated the extremely limited lexical stress effect. A significant difference as a function of lexical stress was noted only in the vertical dimension of low-back vowels. The acoustic and articulatory correlation results revealed a relatively weaker correlation in L2 speakers than in L1 speakers. In the L2 group, certain articulators such as TD and the JAW demonstrated a stronger correlation than LL and TT

    Acoustic articulatory evidence for quantal vowel categories : the features [low] and [back]

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2009.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 139-142).In recent years, research in human speech communication suggested that the inventory of sound units that are observed in vowels across languages is strongly influenced by the acoustic properties of the human subglottal system. That is, there is a discrete set of possible vowel features that are constrained by the interaction of the acoustic/articulatory properties of the vowels and a small set of attributes that are observed in the subglottal region. This thesis tests the hypothesis that subglottal resonances govern vowel feature boundaries for three populations: adult speakers of English; adult speakers of Korean; and children learning English. First, we explored the relations among F1 of vowels, the first subglottal resonances (SubF1) and the feature [low] in English. For the diphthong [??], F1 peaks for vowels showed an acoustic irregularity near the speaker' s SubF1. For monophthongs, analysis of F1 frequency distributions shows a boundary between [+low] and [-low] vowels at the speakers' SubF1. Second, we studied the relations among F2 of Korean vowels, SubF2 and the feature [back], to test whether the relation between subglottal resonances and the feature boundary, demonstrated earlier for English, also can be applied to other languages. Results show that the F2 boundary between [back] and [front] vowels was placed near SubF2 in Korean, as in English. Third, we explored the development of vowel formants in relation to subglottal resonances for 10 children in the age range of 2;6-3;9 years using the database of Imbrie (2005). Results show that at the earlier ages, formant values deviated from the expected relations, but during the six month period in which the measurements were made, there was considerable movement toward the expected values.(cont.)The transition to the expected relations appeared to occur by the age of 3 years for most of these children, in a developmental pattern that was inconsistent with an account in terms of simple anatomical increase. These three sets of observations provide evidence that subglottal resonances play a role in defining vowel feature boundaries, as predicted by Stevens' (1972) hypothesis that contrastive phonological features in human languages have arisen from quantal discontinuities in articulatory-acoustic space.by Youngsook Jung.Ph.D
    • …
    corecore