17 research outputs found

    Development of Parameters towards Voice Bifurcations

    No full text
    Pathological vocal folds are known to exhibit multiple oscillation patterns, depending on tissue imbalance, subglottal pressure level, and other factors. This includes mid-phonation changes due to bifurcations in the underlying voice source system. Knowledge of when changes in oscillation patterns occur is helpful in the assessments of voice disorders, and the knowledge could be transformed into useful objective measures. Mid-phonation bifurcations can occur in rapid succession; hence, a fast classification of oscillation pattern is critical to minimize the averaging of data across bifurcations. This paper proposes frequency-ratio based short-term measures, named harmonic disturbance factor (HDF) and biphonic index (BI), towards the detection of the bifurcations. For the evaluation of HDF and BI, a frequency selection algorithm for glottal source signals is devised, and its efficacy is demonstrated with the glottal area waveforms of four cases, representing the wide range of oscillatory behaviors. The HDF and BI exhibit clear transitions when the voice bifurcations are apparent in the spectrograms. The presented proof-of-concept experiment’s outcomes warrant a larger scale study to formalize the parameters of the frequency selection algorithm

    Influence of Analyzed Sequence Length on Parameters in Laryngeal High-Speed Videoendoscopy

    No full text
    Laryngeal high-speed videoendoscopy (HSV) allows objective quantification of vocal fold vibratory characteristics. However, it is unknown how the analyzed sequence length affects some of the computed parameters. To examine if varying sequence lengths influence parameter calculation, 20 HSV recordings of healthy females during sustained phonation were investigated. The clinical prevalent Photron Fastcam MC2 camera with a frame rate of 4000 fps and a spatial resolution of 512 × 256 pixels was used to collect HSV data. The glottal area waveform (GAW), describing the increase and decrease of the area between the vocal folds during phonation, was extracted. Based on the GAW, 16 perturbation parameters were computed for sequences of 5, 10, 20, 50 and 100 consecutive cycles. Statistical analysis was performed using SPSS Statistics, version 21. Only three parameters (18.8%) were statistically significantly influenced by changing sequence lengths. Of these parameters, one changed until 10 cycles were reached, one until 20 cycles were reached and one, namely Amplitude Variability Index (AVI), changed between almost all groups of different sequence lengths. Moreover, visually observable, but not statistically significant, changes within parameters were observed. These changes were often most prominent between shorter sequence lengths. Hence, we suggest using a minimum sequence length of at least 20 cycles and discarding the parameter AVI

    Dependencies and Ill-designed Parameters Within High-speed Videoendoscopy and Acoustic Signal Analysis

    No full text
    Objective. The phonatory process is often judged during sustained phonation by analyzing the acoustic voice signal and the vocal fold vibrations. Many formulas and parameters have been suggested for qualifying the characteristics of the acoustic signal and the vocal fold vibrations during sustained phonation. These parameters are directly computed from the acoustic signal and the endoscopic glottal area waveform (GAW). The GAW is calculated from laryngeal high-speed videoendoscopy (HSV) recordings and describes the increase and decrease of the glottal area during the phonation process, that is, the opening and closing of the two oscillating vocal folds over time. However, some of the parameters have strong mathematical dependencies with one another and some are ill-defined. The purpose of this study is to identify mathematical dependencies between parameters with the aimof reducing their numbers and suggesting which parameters may best describe the properties of the GAW and the acoustical signal. Methods. In this preliminary investigation, 20 frequently used parameters are examined: 10 GAW only and 10 both GAW and acoustic parameters. Results. In total 13 parameters can be neglected because of mathematical dependencies. In addition, nine of these parameters show problematic features that range from unexpected behavior to ill definition. Conclusions. Reducing the number of parameters appears to be necessary to standardize vocal fold function analysis. This may lead to better comparability of research results from different studies

    Interdependencies between acoustic and high-speed videoendoscopy parameters.

    No full text
    In voice research, uncovering relations between the oscillating vocal folds, being the sound source of phonation, and the resulting perceived acoustic signal are of great interest. This is especially the case in the context of voice disorders, such as functional dysphonia (FD). We investigated 250 high-speed videoendoscopy (HSV) recordings with simultaneously recorded acoustic signals (124 healthy females, 60 FD females, 44 healthy males, 22 FD males). 35 glottal area waveform (GAW) parameters and 14 acoustic parameters were calculated for each recording. Linear and non-linear relations between GAW and acoustic parameters were investigated using Pearson correlation coefficients (PCC) and distance correlation coefficients (DCC). Further, norm values for parameters obtained from 250 ms long sustained phonation data (vowel /i/) were provided. 26 PCCs in females (5.3%) and 8 in males (1.6%) were found to be statistically significant (|corr.| ≄ 0.3). Only minor differences were found between PCCs and DCCs, indicating presence of weak non-linear dependencies between parameters. Fundamental frequency was involved in the majority of all relevant PCCs between GAW and acoustic parameters (19 in females and 7 in males). The most distinct difference between correlations in females and males was found for the parameter Period Variability Index. The study shows only weak relations between investigated acoustic and GAW-parameters. This indicates that the reduction of the complex 3D glottal dynamics to the 1D-GAW may erase laryngeal dynamic characteristics that are reflected within the acoustic signal. Hence, other GAW parameters, 2D-, 3D-laryngeal dynamics and vocal tract parameters should be further investigated towards potential correlations to the acoustic signal

    Influence of spatial camera resolution in high-speed videoendoscopy on laryngeal parameters

    Get PDF
    In laryngeal high-speed videoendoscopy (HSV) the area between the vibrating vocal folds during phonation is of interest, being referred to as glottal area waveform (GAW). Varying camera resolution may influence parameters computed on the GAW and hence hinder the comparability between examinations. This study investigates the influence of spatial camera resolution on quantitative vocal fold vibratory function parameters obtained from the GAW. In total 40 HSV recordings during sustained phonation (20 healthy males and 20 healthy females) were investigated. A clinically used Photron Fastcam MC2 camera with a frame rate of 4000 fps and a spatial resolution of 512x256 pixels was applied. This initial resolution was reduced by pixel averaging to (1) a resolution of 256x128 and (2) to a resolution of 128x64 pixels, yielding three sets of recordings. The GAW was extracted and in total 50 vocal fold vibratory parameters representing different features of the GAW were computed. Statistical analyses using SPSS Statistics, version 21, was performed. 15 Parameters showing strong mathematical dependencies with other parameters were excluded from the main analysis but are given in the Supporting Information. Data analysis revealed clear influence of spatial resolution on GAW parameters. Fundamental period measures and period perturbation measures were the least affected. Amplitude perturbation measures and mechanical measures were most strongly influenced. Most glottal dynamic characteristics and symmetry measures deviated significantly. Most energy perturbation measures changed significantly in males but were mostly unaffected in females. In females 18 of 35 remaining parameters (51%) and in males 22 parameters (63%) changed significantly between spatial resolutions. This work represents the first step in studying the impact of video resolution on quantitative HSV parameters. Clear influences of spatial camera resolution on computed parameters were found. The study results suggest avoiding the use of the most strongly affected parameters. Further, the use of cameras with high resolution is recommended to analyze GAW measures in HSV data

    Correcting nonsquare pixels (Larsen et al., 2023)

    No full text
    Purpose: This research note illustrates the effects of video data with nonsquare pixels on the pixel-based measures obtained from videofluoroscopic swallow studies (VFSS). Method: Six pixel-based distance and area measures were obtained from two different videoflouroscopic study units; both yielding videos with nonsquare pixels with different pixel aspect ratios (PARs). The swallowing measures were obtained from the original VFSS videos and from the videos after their pixels were squared. Results: The results demonstrated significant multivariate effects both in video type (original vs. squared) and in the interaction between video type and sample (two video recordings of different patients, different PARs, and opposing tilt angles of the external reference). A wide range of variabilities was observed on the pixel-based measures between original and squared videos with the percent deviation ranging from 0.1% to 9.1% with the maximum effect size of 7.43. Conclusions: This research note demonstrates the effect of disregarding PAR to distance and area pixel-based parameters. In addition, we present a multilevel roadmap to prevent possible measurement errors that could occur. At the planning stage, the PAR of video source should be identified, and, at the analyses stage, video data should be prescaled prior to analysis with PAR-unaware software. No methodology in prior absolute or relative pixel-based studies reports adjustment to the PAR prior to measurements nor identify the PAR as a possible source of variation within the literature. Addressing PAR will improve the precision and stability of pixel-based VFSS findings and improve comparability within and across clinical and research settings. Supplemental Material S1. Obtain pixel aspect ratio (PAR) using MediaInfo (v22.03). Supplemental Material S2. Estimating pixel aspect ratio (PAR) using ImageJ (v1.53p). Supplemental Material S3. Setting up video scaling to square pixels in HandBrake (v1.5.1).  Supplemental Material S4. Premeasurement image scaling in ImageJ (v1.53p). Larsen, D., Ikuma, T., Neubig, L., Kist, A. M., Leonard, R., McWhorter, A. J., & Kunduk, M. (2023). Pixel-based swallow measurements: Correcting nonsquare pixels. Journal of Speech, Language, and Hearing Research. Advance online publication. https://doi.org/10.1044/2022_JSLHR-22-00306 </p

    Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos

    No full text
    Endoscopic high-speed video (HSV) systems for visualization and assessment of vocal fold dynamics in the larynx are diverse and technically advancing. To consider resulting &ldquo;concepts shifts&rdquo; for neural network (NN)-based image processing, re-training of already trained and used NNs is necessary to allow for sufficiently accurate image processing for new recording modalities. We propose and discuss several re-training approaches for convolutional neural networks (CNN) being used for HSV image segmentation. Our baseline CNN was trained on the BAGLS data set (58,750 images). The new BAGLS-RT data set consists of additional 21,050 images from previously unused HSV systems, light sources, and different spatial resolutions. Results showed that increasing data diversity by means of preprocessing already improves the segmentation accuracy (mIoU + 6.35%). Subsequent re-training further increases segmentation performance (mIoU + 2.81%). For re-training, finetuning with dynamic knowledge distillation showed the most promising results. Data variety for training and additional re-training is a helpful tool to boost HSV image segmentation quality. However, when performing re-training, the phenomenon of catastrophic forgetting should be kept in mind, i.e., adaption to new data while forgetting already learned knowledge
    corecore