239 research outputs found

    Quantitative analysis of videokymography in normal and pathological vocal folds: a preliminari study.

    Get PDF
    Videokymography (VKG) captures high-speed images of the vocal folds independently of the periodicity of the acoustic signal. The aim of this study was to preliminarily assess a software package that can objectively measure specific parameters of vocal fold vibration. From August 2009 until December 2010, we prospectively evaluated 40 subjects (Group A, 18 normal subjects; Group B, 14 patients with benign lesions of the middle third of the vocal fold, such as polyps and cysts; Group C, 8 patients treated by endoscopic excision of vocal fold benign lesions) by videoendoscopy, videolaryngostroboscopy, and VKG. A VKG camera was coupled to a 70 telescope and video was recorded during phonation. Images were objectively analyzed by a post-processing software tool (VKG-Analyser) with a user-friendly interface developed by our group. Different parameters were considered, including the ratio between the amplitude of the vibration of one vocal fold with respect to the contralateral (Ramp), the ratio between the period of one vocal fold vibration and the opposite one (Rper), and the ratio between the duration of the open and closed phase within a glottal cycle (Roc). Mean values for Ramp, Rper, and Roc in Group A were 1.05, 1.04, and 1.35, respectively; in Group B were 1.63, 0.92, and 0.97, respectively; and in Group C were 1.13, 0.91, and 1.85, respectively. Quantitative analysis of videokymograms by the herein presented tool, named VKG-Analyser, is useful for objective evaluation of the vibratory pattern in normal and pathologic vocal folds. Important future developments of this tool for the study of both physiologic and pathologic patterns of vocal fold vibration can be expected

    Assessment of vocal folds phonation by means of computer analysis of laryngovideostroboscopic images – a pilot study

    Get PDF
    Wprowadzenie. Komputerowe techniki analizy obrazów umożliwiają wprowadzenie nowych metod obrazów głośni podczas fonacji oraz wyznaczenie obiektywnych parametrów oceny drgań fałdów głosowych, wspomagających lekarza laryngologa/foniatrę w bardziej precyzyjnej diagnostyce narządu głosu. Cel pracy. Zastosowanie algorytmów analizy obrazów do jakościowego i ilościowego opisu drgań fonacyjnych fałdów głosowych. Materiał i metody. Badania wideostroboskopowe głośni przeprowadzono u 15 osób: 5 pacjentów ze stwierdzonymi guzkami głosowymi, 5 pacjentów z niedomykalnością głośni oraz 5 osób z głosem prawidłowym. Zastosowano algorytmy cyfrowego przetwarzania oraz segmentacji obrazów. Wyznaczono sygnały pola światła głośni dla kolejnych cykli fonacji oraz zbudowano glottowibrogramy stanowiące przestrzenno-czasowe zobrazowanie drgań fałdów głosowych. Wyniki. Wyznaczono parametry geometryczne światła głośni dla każdego obrazu sekwencji wideostroboskopowej. Obliczono uśrednione profile szerokości światła głośni w fazie zamknięcia cyklu fonacyjnego dla poszczególnych grup badanych pacjentów. Wnioski. W pilotażowych badaniach pacjentów potwierdzono przydatność opracowanych metod analizy obrazów w precyzyjnym obrazowaniu i ocenie ilościowej drgań fonacyjnych fałdów głosowych na podstawie filmów wideostroboskopowych.Introduction. Medical imaging techniques enable determination of novel visualisation modalities of the vocal folds during phonation and definition of parameters that can aid the otolaryngologist/phoniatrician in a more precise diagnosis of voice disorders. Aim. Application of computer vision algorithms for qualitative and quantitative analysis of vocal-folds phonation vibrations. Materials and methods. Videostroboscopic examinations of the glottis were carried out for 15 individuals divided into 3 groups including five subjects each: with diagnosed nodules, with glottal insufficiency, and with no voice disorders. Image pre-processing and image segmentation algorithms were applied. Signals of the glottis area for consecutive phonation cycles were derived. Glottovibrograms were also built which facilitate spatio-temporal visualisation of the vibrating vocal folds. Results. The geometric parameters of the glottis area for each image in the stroboscopic video have been determined. The average width profiles of the glottis area for the closure phase of the glottal cycle have been computed for each group of the examined patients. Conclusions. The conducted pilot study has confirmed that computer aided imaging methods could be applied in the qualitative and quantitative analysis of the videostroboscopic images showing the phonatory motions of the vocal folds

    Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network

    Get PDF
    The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Modeling and imaging of the vocal fold vibration for voice health.

    Get PDF

    Evaluation of High-Speed Videoendoscopy for Bayesian Inference on Reduced Order Vocal Fold Models

    Get PDF
    The ability to use our voice occurs through a complex bio-mechanical process known as phonation. The study of this process is interesting, not only because of the complex physical phenomena involved, but also because of the presence of phonation disorders that can make the everyday task of using ones voice difficult. Clinical studies of phonation aim to help diagnose such disorders using various measurement techniques, such as microphone recordings, video of the vocal folds, and perceptual sound quality measures. In contrast, scientific investigations of phonation have focused on understanding the physical phenomena behind phonation using simplified physical and numerical models constructed using representative population based parameters. A particularly useful type of model, reduced-order numerical models, are simplified representations of the vocal folds with low computational complexity that allow broad parameter changes to be investigated. To bring the physical understanding of phonation from these models into clinical usage, it is necessary to have patient specific parameters. Due to the difficulty of measuring vocal fold parameters and other structures in phonation directly, inverse analysis techniques must be employed. These techniques estimate the parameters of a model, by finding model parameters that lead to outputs of the model which compare well with measured outputs. With the measured outputs being patient specific measurements, these techniques can produce patient specific model parameters. However, this is complicated by the fact that measurements are uncertain, which leads to uncertainty in inferred parameters. The uncertainty in the parameters provides a way to judge how confident clinicians should be in using them. Large measurements errors could result in high uncertainties (and vice versa), which should guide clinicians on whether or not to believe the estimated parameters. Bayesian inference is an inverse analysis technique, that can take into account the inherent uncertainty in measurements in a probabilistic framework. Applying Bayesian inference to reduced-order models and clinical measurements allows patient specific model parameters with associated uncertainties to be inferred. A promising clinical measurement for use in Bayesian inference is high-speed videoendoscopy, in which high-speed video is taken of the vocal folds in motion. This captures the time varying motion of the vocal folds, which allows many quantitative measurements to be derived from the resulting video, for example the glottal width (distance between the vocal folds) or glottal area (area between the vocal folds). High-speed videoendoscopy is subject to variable imaging parameters, in particular the frame rate, spatial resolution, and tilted views of the camera can all modify the resulting video of the vocal folds, changing the uncertainty in the derived measurements. To investigate the effect of these three imaging parameters on Bayesian inference applied to high-speed video endoscopy, a simulated high-speed videoendoscopy experiment was conducted. Using a reduced order model, with known parameters, a set of enlarged, artificial vocal folds were driven in slow motion. These were imaged by a consumer DSLR camera, where the slow motion increased the effective frame rate, and the enlarged vocal folds increased the effective spatial resolution, to a fidelity much greater than typical high-speed videos of the vocal folds. This allowed investigation of the three parameters; titled views of the camera were investigated by physically tilting the camera, while variable frame rates and spatial resolutions were investigated by numerical downsampling of the original recording. Bayesian inference was conducted on these simulated high-speed videos, by measuring the distance between the vocal folds (the glottal width), in order to determine the parameters of the same reduced-order model driving the artificial vocal folds. This provided a reference to compare the estimated parameters with. The changes in estimated parameters from Bayesian inference were then investigated as the angle of view, frame rate, and spatial resolution were modified. From the experiment, the effects of frame rate, spatial resolution, and angle of view in high-speed videoendoscopy were found relative to changes from a reference video. Specifically, uncertainty in estimates increased linearly with respect to downsampling factor of frame rate. A frame rate that is half that of the reference video will have an uncertainty on estimated parameters that is twice as large. Spatial resolution affects the level of uncertainty based on the edge detection techniques that are used to extract quantitative data (i.e., the glottal width in this study). As the spatial resolution was downsampled, the level of error from the edge detection algorithm increased linearly with respect to the downsampling factor, which subsequently led to the same linear increase in the level of uncertainty in the estimate. However, different edge detection algorithms will likely have different accuracies as the resolution of the image decreases. While in this study it is preferable to decrease spatial resolution instead of frame rate, more general conclusions would be dependent on the specific edge detection technique used. The angle of view was found to bias estimates as a result of projecting the vocal folds (glottis) onto an offset image plane (like viewing a coin from an angle, results in increasingly narrow ellipses until a single line is formed, rather than a circle). This decreased the glottal width measured, which biased the estimated parameters. To account for this bias, it is suggested that the angle of view can be treated as an uncertain parameter, which leads to increased uncertainty in the quantitative measures from high-speed video. Alternatively, the angle of view can be estimated as an additional parameter

    Acoustic and videoendoscopic techniques to improve voice assessment via relative fundamental frequency

    Get PDF
    Quantitative measures of laryngeal muscle tension are needed to improve assessment and track clinical progress. Although relative fundamental frequency (RFF) shows promise as an acoustic estimate of laryngeal muscle tension, it is not yet transferable to the clinic. The purpose of this work was to refine algorithmic estimation of RFF, as well as to enhance the knowledge surrounding the physiological underpinnings of RFF. The first study used a large database of voice samples collected from 227 speakers with voice disorders and 256 typical speakers to evaluate the effects of fundamental frequency estimation techniques and voice sample characteristics on algorithmic RFF estimation. By refining fundamental frequency estimation using the Auditory Sawtooth Waveform Inspired Pitch Estimator—Prime (Auditory-SWIPE′) algorithm and accounting for sample characteristics via the acoustic measure, pitch strength, algorithmic errors related to the accuracy and precision of RFF were reduced by 88.4% and 17.3%, respectively. The second study sought to characterize the physiological factors influencing acoustic outputs of RFF estimation. A group of 53 speakers with voice disorders and 69 typical speakers each produced the utterance, /ifi/, while simultaneous recordings were collected using a microphone and flexible nasendoscope. Acoustic features calculated via the microphone signal were examined in reference to the physiological initiation and termination of vocal fold vibration. The features that corresponded with these transitions were then implemented into the RFF algorithm, leading to significant improvements in the precision of the RFF algorithm to reflect the underlying physiological mechanisms for voicing offsets (p < .001, V = .60) and onsets (p < .001, V = .54) when compared to manual RFF estimation. The third study further elucidated the physiological underpinnings of RFF by examining the contribution of vocal fold abduction to RFF during intervocalic voicing offsets. Vocal fold abductory patterns were compared to RFF values in a subset of speakers from the second study, comprising young adults, older adults, and older adults with Parkinson’s disease. Abductory patterns were not significantly different among the three groups; however, vocal fold abduction was observed to play a significant role in measures of RFF at voicing offset. By improving algorithmic estimation and elucidating aspects of the underlying physiology affecting RFF, this work adds to the utility of RFF for use in conjunction with current clinical techniques to assess laryngeal muscle tension.2021-09-29T00:00:00

    On the design of visual feedback for the rehabilitation of hearing-impaired speech

    Get PDF
    corecore