39 research outputs found

    An approach to explaining formants (Story, 2024)

    Full text link
    Purpose: This tutorial is a description of a possible approach to teaching the concept of formants to students in a speech science course, at either the undergraduate or graduate level. The approach is to explain formants as prominent regions of energy in the output spectrum envelope radiated at the lips, and how they arise as the superposition of vocal tract resonances on a source signal. Standing waves associated with vocal tract resonances are briefly explained and standing wave animations are provided. Animations of the temporal variation of the vocal tract, vocal tract resonances, spectra, and spectrograms, along with audio samples are included to provide dynamic demonstrations of the concept of formants.Conclusions: The explanations, accompanying demonstrations, and suggested activities are intended to provide a launching point for understanding formants and how they can be measured, analyzed, and interpreted. As a result, participants should be able to describe the meaning of the term “formant” as it relates to a spectrum and a spectrogram, explain the difference between formants and vocal tract resonances, explain how vocal tract resonances combined with the voice source generate formants, and identify formants in both narrow-band and wide-band spectrograms and track their time-varying patterns with a formant tracking algorithm.Supplemental Material S1. Standing wave in neutral vocal tract configuration for the first resonance.Supplemental Material S2. Standing wave in neutral vocal tract configuration for the second resonance.Supplemental Material S3. Standing wave in neutral vocal tract configuration for the third resonance.Supplemental Material S4. Pressure distribution in neutral vocal tract configuration at 1000 Hz, off resonance.Supplemental Material S5. Animation of the temporal variation of the components of the source-filter representation during production of “Hello, how are you.” The animation also includes an audio track that is a slowed version of the phrase generated by the TubeTalker model.Supplemental Material S6. Audio file containing the real-time voice source signal (glottal flow wave) generated during the TubeTalker simulation of “Hello, how are you.”Supplemental Material S7. Audio file containing the real-time output pressure signal generated during the TubeTalker simulation of “Hello, how are you.”Supplemental Material S8. Animation of the temporal variation of the vocal tract in two representations during production of “Hello, how are you.” In the upper inset plot, the vocal tract is shown in tubular form, and in the main plot in the middle the vocal tract is shown in a pseudo-midsagittal form. The lower inset plot shows the simultaneous temporal variation of the frequency response function (resonances). The animation also includes an audio track that is a slowed version of the phrase generated by the TubeTalker model.Supplemental Material S9. Animation of the temporal variation of the frequency response function in three-dimensions (time, frequency, amplitude) during production of “Hello, how are you.” There is a delay in middle of the animation to allow the viewer to see the full history and then the view rotates into a traditional spectrographic perspective. The animation also includes an audio track that is a slowed version of the phrase generated by the TubeTalker model.Supplemental Material S10. Animation of the temporal variation of narrow-band spectra in three-dimensions (time, frequency, amplitude) during production of “Hello, how are you.” There is a delay in middle of the animation to allow the viewer to see the full history and then the view rotates into a traditional spectrographic perspective. The animation also includes an audio track that is a slowed version of the phrase generated by the TubeTalker model.Story, B. H. (2024). An approach to explaining formants. Perspectives of the ASHA Special Interest Groups. Advance online publication. https://doi.org/10.1044/2023_PERSP-23-00200</p

    A model of speech production based on the acoustic relativity of the vocal tract

    Full text link
    A model is described in which the effects of articulatory movements to produce speech are generated by specifying relative acoustic events along a time axis. These events consist of directional changes of the vocal tract resonance frequencies that, when associated with a temporal event function, are transformed via acoustic sensitivity functions, into time-varying modulations of the vocal tract shape. Because the time course of the events may be considerably overlapped in time, coarticulatory effects are automatically generated. Production of sentence-level speech with the model is demonstrated with audio samples and vocal tract animations. (C) 2019 Acoustical Society of America.6 month embargo; published online: 17 October 2019This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    An acoustically-driven vocal tract model for stop consonant production

    Full text link
    The purpose of this study was to further develop a multi-tier model of the vocal tract area function in which the modulations of shape to produce speech are generated by the product of a vowel substrate and a consonant superposition function. The new approach consists of specifying input parameters for a target consonant as a set of directional changes in the resonance frequencies of the vowel substrate. Using calculations of acoustic sensitivity functions, these "resonance deflection patterns" are transformed into time-varying deformations of the vocal tract shape without any direct specification of location or extent of the consonant constriction along the vocal tract. The configuration of the constrictions and expansions that are generated by this process were shown to be physiologically-realistic and produce speech sounds that are easily identifiable as the target consonants. This model is a useful enhancement for area function-based synthesis and can serve as a tool for understanding how the vocal tract is shaped by a talker during speech production. (C) 2016 Elsevier B.V. All rights reserved.NIH [R01-DC011275]; NSF [BCS-1145011]24 month embargo; Available online 9 December 2016This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Influence of Left–Right Asymmetries on Voice Quality in Simulated Paramedian Vocal Fold Paralysis

    Full text link
    Purpose: The purpose of this study was to determine the vocal fold structural and vibratory symmetries that are important to vocal function and voice quality in a simulated paramedian vocal fold paralysis. Method: A computational kinematic speech production model was used to simulate an exemplar "voice" on the basis of asymmetric settings of parameters controlling glottal configuration. These parameters were then altered individually to determine their effect on maximum flow declination rate, spectral slope, cepstral peak prominence, harmonics-to-noise ratio, and perceived voice quality. Results: Asymmetry of each of the 5 vocal fold parameters influenced vocal function and voice quality; measured change was greatest for adduction and bulging. Increasing the symmetry of all parameters improved voice, and the best voice occurred with overcorrection of adduction, followed by bulging, nodal point ratio, starting phase, and amplitude of vibration. Conclusions: Although vocal process adduction and edge bulging asymmetries are most influential in voice quality for simulated vocal fold motion impairment, amplitude of vibration and starting phase asymmetries are also perceptually important. These findings are consistent with the current surgical approach to vocal fold motion impairment, where goals include medializing the vocal process and straightening concave edges. The results also explain many of the residual postoperative voice limitations.This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Vocal tract modes based on multiple area function sets from one speaker

    Full text link
    corecore