14 research outputs found
Recommended from our members
Frailty Syndrome, Cognition, and Dysphonia in the Elderly
Purpose. The purpose of the current study is to determine the relation of frailty syndrome to acoustic measures of voice quality and voice-related handicap. Methods. Seventy-three adults (52 community-dwelling participants and 21 assisted living residents) age 60 and older completed frailty screening, acoustic assessment, cognitive screening, and the Voice Handicap Index-10 (VHI-10). Factor analysis was used to consolidate acoustic measures. Statistical analysis included multiple regression, analysis of variance, and Tukey post-hoc tests with alfa of 0.05. Results. Montreal Cognitive Assessment (MoCA) and exhaustion explained 28% of the variance in VHI-10. MoCA and sex explained 27% of the variance in factor 1 (spectral ratio), age and MoCA explained 13% of the variance in factor 2 (cepstral peak prominence for speech), and slowness explained 10% of the variance in factor 3 (cepstral peak prominence for sustained /a/). There were statistically significant differences in two measures across frailty groups: VHI-10 and MoCA. Acoustic factor scores did not differ significantly among frailty groups (P > 0.05). Conclusions. Voice-related handicap and cognitive status differed among robust and frail older adults, yet vocal function measures did not. The components of frailty most related to VHI-10 were exhaustion and weight loss rather than slowness, weakness, or inactivity. Based on these findings, routine screening of physical frailty and cognition are recommended as part of a complete voice evaluation for older adults.12 month embargo; published online: 25 July 2018This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Influence of Left–Right Asymmetries on Voice Quality in Simulated Paramedian Vocal Fold Paralysis
Purpose: The purpose of this study was to determine the vocal fold structural and vibratory symmetries that are important to vocal function and voice quality in a simulated paramedian vocal fold paralysis. Method: A computational kinematic speech production model was used to simulate an exemplar "voice" on the basis of asymmetric settings of parameters controlling glottal configuration. These parameters were then altered individually to determine their effect on maximum flow declination rate, spectral slope, cepstral peak prominence, harmonics-to-noise ratio, and perceived voice quality. Results: Asymmetry of each of the 5 vocal fold parameters influenced vocal function and voice quality; measured change was greatest for adduction and bulging. Increasing the symmetry of all parameters improved voice, and the best voice occurred with overcorrection of adduction, followed by bulging, nodal point ratio, starting phase, and amplitude of vibration. Conclusions: Although vocal process adduction and edge bulging asymmetries are most influential in voice quality for simulated vocal fold motion impairment, amplitude of vibration and starting phase asymmetries are also perceptually important. These findings are consistent with the current surgical approach to vocal fold motion impairment, where goals include medializing the vocal process and straightening concave edges. The results also explain many of the residual postoperative voice limitations.This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Recommended from our members
Common Terminology and Acoustic Measures for Human Voice and Birdsong
The zebra finch is used as a model to study the neural circuitry of auditory-guided human vocal production. The terminology of birdsong production and acoustic analysis, however, differs from human voice production, making it difficult for voice researchers of either species to navigate the literature from the other. The purpose of this research note is to identify common terminology and measures to better compare information across species. Terminology used in the birdsong literature will be mapped onto terminology used in the human voice production literature. Measures typically used to quantify the percepts of pitch, loudness, and quality will be described. Measures common to the literature in both species will be made from the songs of 3 middle-age birds using Praat and Song Analysis Pro. Two measures, cepstral peak prominence (CPP) and Wiener entropy (WE), will be compared to determine if they provide similar information. Similarities and differences in terminology and acoustic analyses are presented. A core set of measures including frequency, frequency variability within a syllable, intensity, CPP, and WE are proposed for future studies. CPP and WE are related yet provide unique information about the syllable structure. Using a core set of measures familiar to both human voice and birdsong researchers, along with both CPP and WE, will allow characterization of similarities and differences among birds. Standard terminology and measures will improve accessibility of the birdsong literature to human voice researchers and vice versa.University of Arizona startup funds; University of Arizona Undergraduate Biological Research ProgramThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Recommended from our members
Middle age, a key time point for changes in birdsong and human voice
Voice changes caused by natural aging and neurodegenerative diseases are prevalent in the aging population and diminish quality of life. Most treatments involve behavioral interventions that target the larynx because of a limited understanding of central brain mechanisms. The songbird offers a unique entry point into studying age-related changes in vocalizations because of a well-characterized neural circuitry for song that shares homology to human vocal control areas. Previously we established a translational dictionary for evaluating acoustic features of birdsong in the context of human voice measurements. In the present study. we conduct extensive analyses of birdsongs from young, middle-aged, and old male zebra finches. Our findings show that birdsongs become louder with age, and changes in periodic energy occur at middle age but are transient; songs appear to stabilize in old birds. Furthermore, faster songs are detected in finches at middle age compared with young and old finches. Vocal disorders in humans emerge at middle age, but the underlying brain pathologies are not well identified. The current findings will motivate future investigations using the songbird model to identify possible brain mechanisms involved in human vocal disorders of aging.University of Arizona; Undergraduate Biology Research ProgramSupplementary data available in the University of Arizona Research Data RepositoryThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Perceptual consequences of changes in epilaryngeal area and shape
The influence of epilaryngeal area on glottal flow and the acoustic signal has been described [Titze, J. Acoust. Soc. Am. 123, 2733–2749 (2008)], but it is not known how (or whether) changes in epilaryngeal area influence perceived voice quality. This study examined these relationships in a kinematic vocal tract model. Epilaryngeal constrictions and expansions were simulated at the levels of the aryepiglottic folds and the ventricular folds in the context of four glottal configurations representing normal vibration to severe vocal fold paralysis, for the three corner vowels /a/, /i/, and /u/. Minimum and maximum glottal flow, maximum flow declination rate, spectral slope, cepstral peak prominence, and the harmonics-to-noise ratio were measured, and listeners completed a perceptual sort-and-rate task for all samples. Epilaryngeal constriction and expansion caused salient differences in voice quality. The location of constriction was also perceivable. Vowels simulated with aryepiglottic constriction demonstrated lower maximum airflow and less noise than the other epilaryngeal shapes, and listeners consistently perceived them as distinct from other stimuli. Acoustic differences decreased with increasing severity of simulated paralysis. Results of epilaryngeal constriction and expansion were similar for /a/ and /i/, and produced slightly different patterns for /u/
Modeling the voice source in terms of spectral slopes
A psychoacoustic model of the voice source spectrum is proposed. The model is characterized by four spectral slope parameters: the difference in amplitude between the first two harmonics (H1–H2), the second and fourth harmonics (H2–H4), the fourth harmonic and the harmonic nearest 2 kHz in frequency (H4–2 kHz), and the harmonic nearest 2 kHz and that nearest 5 kHz (2 kHz–5 kHz). As a step toward model validation, experiments were conducted to establish the acoustic and perceptual independence of these parameters. In experiment 1, the model was fit to a large number of voice sources. Results showed that parameters are predictable from one another, but that these relationships are due to overall spectral roll-off. Two additional experiments addressed the perceptual independence of the source parameters. Listener sensitivity to H1–H2, H2–H4, and H4–2 kHz did not change as a function of the slope of an adjacent component, suggesting that sensitivity to these components is robust. Listener sensitivity to changes in spectral slope from 2 kHz to 5 kHz depended on complex interactions between spectral slope, spectral noise levels, and H4–2 kHz. It is concluded that the four parameters represent non-redundant acoustic and perceptual aspects of voice quality
Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos
Endoscopic high-speed video (HSV) systems for visualization and assessment of vocal fold dynamics in the larynx are diverse and technically advancing. To consider resulting “concepts shifts” for neural network (NN)-based image processing, re-training of already trained and used NNs is necessary to allow for sufficiently accurate image processing for new recording modalities. We propose and discuss several re-training approaches for convolutional neural networks (CNN) being used for HSV image segmentation. Our baseline CNN was trained on the BAGLS data set (58,750 images). The new BAGLS-RT data set consists of additional 21,050 images from previously unused HSV systems, light sources, and different spatial resolutions. Results showed that increasing data diversity by means of preprocessing already improves the segmentation accuracy (mIoU + 6.35%). Subsequent re-training further increases segmentation performance (mIoU + 2.81%). For re-training, finetuning with dynamic knowledge distillation showed the most promising results. Data variety for training and additional re-training is a helpful tool to boost HSV image segmentation quality. However, when performing re-training, the phenomenon of catastrophic forgetting should be kept in mind, i.e., adaption to new data while forgetting already learned knowledge
Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos
Endoscopic high-speed video (HSV) systems for visualization and assessment of vocal fold dynamics in the larynx are diverse and technically advancing. To consider resulting “concepts shifts” for neural network (NN)-based image processing, re-training of already trained and used NNs is necessary to allow for sufficiently accurate image processing for new recording modalities. We propose and discuss several re-training approaches for convolutional neural networks (CNN) being used for HSV image segmentation. Our baseline CNN was trained on the BAGLS data set (58,750 images). The new BAGLS-RT data set consists of additional 21,050 images from previously unused HSV systems, light sources, and different spatial resolutions. Results showed that increasing data diversity by means of preprocessing already improves the segmentation accuracy (mIoU + 6.35%). Subsequent re-training further increases segmentation performance (mIoU + 2.81%). For re-training, finetuning with dynamic knowledge distillation showed the most promising results. Data variety for training and additional re-training is a helpful tool to boost HSV image segmentation quality. However, when performing re-training, the phenomenon of catastrophic forgetting should be kept in mind, i.e., adaption to new data while forgetting already learned knowledge