14 research outputs found

    Influence of Left–Right Asymmetries on Voice Quality in Simulated Paramedian Vocal Fold Paralysis

    No full text
    Purpose: The purpose of this study was to determine the vocal fold structural and vibratory symmetries that are important to vocal function and voice quality in a simulated paramedian vocal fold paralysis. Method: A computational kinematic speech production model was used to simulate an exemplar "voice" on the basis of asymmetric settings of parameters controlling glottal configuration. These parameters were then altered individually to determine their effect on maximum flow declination rate, spectral slope, cepstral peak prominence, harmonics-to-noise ratio, and perceived voice quality. Results: Asymmetry of each of the 5 vocal fold parameters influenced vocal function and voice quality; measured change was greatest for adduction and bulging. Increasing the symmetry of all parameters improved voice, and the best voice occurred with overcorrection of adduction, followed by bulging, nodal point ratio, starting phase, and amplitude of vibration. Conclusions: Although vocal process adduction and edge bulging asymmetries are most influential in voice quality for simulated vocal fold motion impairment, amplitude of vibration and starting phase asymmetries are also perceptually important. These findings are consistent with the current surgical approach to vocal fold motion impairment, where goals include medializing the vocal process and straightening concave edges. The results also explain many of the residual postoperative voice limitations.This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Perceptual consequences of changes in epilaryngeal area and shape

    No full text
    The influence of epilaryngeal area on glottal flow and the acoustic signal has been described [Titze, J. Acoust. Soc. Am. 123, 2733–2749 (2008)], but it is not known how (or whether) changes in epilaryngeal area influence perceived voice quality. This study examined these relationships in a kinematic vocal tract model. Epilaryngeal constrictions and expansions were simulated at the levels of the aryepiglottic folds and the ventricular folds in the context of four glottal configurations representing normal vibration to severe vocal fold paralysis, for the three corner vowels /a/, /i/, and /u/. Minimum and maximum glottal flow, maximum flow declination rate, spectral slope, cepstral peak prominence, and the harmonics-to-noise ratio were measured, and listeners completed a perceptual sort-and-rate task for all samples. Epilaryngeal constriction and expansion caused salient differences in voice quality. The location of constriction was also perceivable. Vowels simulated with aryepiglottic constriction demonstrated lower maximum airflow and less noise than the other epilaryngeal shapes, and listeners consistently perceived them as distinct from other stimuli. Acoustic differences decreased with increasing severity of simulated paralysis. Results of epilaryngeal constriction and expansion were similar for /a/ and /i/, and produced slightly different patterns for /u/

    Modeling the voice source in terms of spectral slopes

    No full text
    A psychoacoustic model of the voice source spectrum is proposed. The model is characterized by four spectral slope parameters: the difference in amplitude between the first two harmonics (H1–H2), the second and fourth harmonics (H2–H4), the fourth harmonic and the harmonic nearest 2 kHz in frequency (H4–2 kHz), and the harmonic nearest 2 kHz and that nearest 5 kHz (2 kHz–5 kHz). As a step toward model validation, experiments were conducted to establish the acoustic and perceptual independence of these parameters. In experiment 1, the model was fit to a large number of voice sources. Results showed that parameters are predictable from one another, but that these relationships are due to overall spectral roll-off. Two additional experiments addressed the perceptual independence of the source parameters. Listener sensitivity to H1–H2, H2–H4, and H4–2 kHz did not change as a function of the slope of an adjacent component, suggesting that sensitivity to these components is robust. Listener sensitivity to changes in spectral slope from 2 kHz to 5 kHz depended on complex interactions between spectral slope, spectral noise levels, and H4–2 kHz. It is concluded that the four parameters represent non-redundant acoustic and perceptual aspects of voice quality

    Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos

    No full text
    Endoscopic high-speed video (HSV) systems for visualization and assessment of vocal fold dynamics in the larynx are diverse and technically advancing. To consider resulting “concepts shifts” for neural network (NN)-based image processing, re-training of already trained and used NNs is necessary to allow for sufficiently accurate image processing for new recording modalities. We propose and discuss several re-training approaches for convolutional neural networks (CNN) being used for HSV image segmentation. Our baseline CNN was trained on the BAGLS data set (58,750 images). The new BAGLS-RT data set consists of additional 21,050 images from previously unused HSV systems, light sources, and different spatial resolutions. Results showed that increasing data diversity by means of preprocessing already improves the segmentation accuracy (mIoU + 6.35%). Subsequent re-training further increases segmentation performance (mIoU + 2.81%). For re-training, finetuning with dynamic knowledge distillation showed the most promising results. Data variety for training and additional re-training is a helpful tool to boost HSV image segmentation quality. However, when performing re-training, the phenomenon of catastrophic forgetting should be kept in mind, i.e., adaption to new data while forgetting already learned knowledge

    Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos

    No full text
    Endoscopic high-speed video (HSV) systems for visualization and assessment of vocal fold dynamics in the larynx are diverse and technically advancing. To consider resulting “concepts shifts” for neural network (NN)-based image processing, re-training of already trained and used NNs is necessary to allow for sufficiently accurate image processing for new recording modalities. We propose and discuss several re-training approaches for convolutional neural networks (CNN) being used for HSV image segmentation. Our baseline CNN was trained on the BAGLS data set (58,750 images). The new BAGLS-RT data set consists of additional 21,050 images from previously unused HSV systems, light sources, and different spatial resolutions. Results showed that increasing data diversity by means of preprocessing already improves the segmentation accuracy (mIoU + 6.35%). Subsequent re-training further increases segmentation performance (mIoU + 2.81%). For re-training, finetuning with dynamic knowledge distillation showed the most promising results. Data variety for training and additional re-training is a helpful tool to boost HSV image segmentation quality. However, when performing re-training, the phenomenon of catastrophic forgetting should be kept in mind, i.e., adaption to new data while forgetting already learned knowledge
    corecore