1,190 research outputs found

    Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network

    Get PDF
    The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future

    Automatic Segmentation of Glottal Space from Video Images Based on Mathematical Morphology and the hough Transform

    Get PDF
    Vocal disorders directly arise from the physical shape of the vocal cords. Videostroboscopic imaging provides doctors with valuable information about the physical shape of the vocal cords and about the way these cords move. Segmentation of the glottal space is necessary in order to characterize morphological disorders of vocal folds. One of the main problems with the methods presented is their low level of accuracy. To solve this problem, an automatic method based on Mathematical Morphology edge detection and the Hough transformation is presented in this article to extract the glottal space from the videostroboscopic images presented. Our method compared with the histogram and active contours methods and the findings showed that our proposed method yields better results.DOI:http://dx.doi.org/10.11591/ijece.v2i2.21

    Segmentation of the glottal space from laryngeal images using the watershed transform

    Full text link
    The present work describes a new method for the automatic detection of the glottal space from laryngeal images obtained either with high speed or with conventional video cameras attached to a laryngoscope. The detection is based on the combination of several relevant techniques in the field of digital image processing. The image is segmented with a watershed transform followed by a region merging, while the final decision is taken using a simple linear predictor. This scheme has successfully segmented the glottal space in all the test images used. The method presented can be considered a generalist approach for the segmentation of the glottal space because, in contrast with other methods found in literature, this approach does not need either initialization or finding strict environmental conditions extracted from the images to be processed. Therefore, the main advantage is that the user does not have to outline the region of interest with a mouse click. In any case, some a priori knowledge about the glottal space is needed, but this a priori knowledge can be considered weak compared to the environmental conditions fixed in former works

    A high-speed quantitative analysis of vocal fold vibration in normal and dysphonic subjects

    Get PDF
    A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2007.Thesis (B.Sc)--University of Hong Kong, 2007.Also available in print.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Artificial intelligence in clinical endoscopy: Insights in the field of videomics

    Get PDF
    Artificial intelligence is being increasingly seen as a useful tool in medicine. Specifically, these technologies have the objective to extract insights from complex datasets that cannot easily be analyzed by conventional statistical methods. While promising results have been obtained for various -omics datasets, radiological images, and histopathologic slides, analysis of videoendoscopic frames still represents a major challenge. In this context, videomics represents a burgeoning field wherein several methods of computer vision are systematically used to organize unstructured data from frames obtained during diagnostic videoendoscopy. Recent studies have focused on five broad tasks with increasing complexity: quality assessment of endoscopic images, classification of pathologic and nonpathologic frames, detection of lesions inside frames, segmentation of pathologic lesions, and in-depth characterization of neoplastic lesions. Herein, we present a broad overview of the field, with a focus on conceptual key points and future perspectives

    ANALYSIS OF VOCAL FOLD KINEMATICS USING HIGH SPEED VIDEO

    Get PDF
    Vocal folds are the twin in-folding of the mucous membrane stretched horizontally across the larynx. They vibrate modulating the constant air flow initiated from the lungs. The pulsating pressure wave blowing through the glottis is thus the source for voiced speech production. Study of vocal fold dynamics during voicing are critical for the treatment of voice pathologies. Since the vocal folds move at 100 - 350 cycles per second, their visual inspection is currently done by strobosocopy which merges information from multiple cycles to present an apparent motion. High Speed Digital Laryngeal Imaging(HSDLI) with a temporal resolution of up to 10,000 frames per second has been established as better suited for assessing the vocal fold vibratory function through direct recording. But the widespread use of HSDLI is limited due to lack of consensus on the modalities like features to be examined. Development of the image processing techniques which circumvents the need for the tedious and time consuming effort of examining large volumes of recording has room for improvement. Fundamental questions like the required frame rate or resolution for the recordings is still not adequately answered. HSDLI cannot get the absolute physical measurement of the anatomical features and vocal fold displacement. This work addresses these challenges through improved signal processing. A vocal fold edge extraction technique with subpixel accuracy, suited even for hard to record pediatric population is developed first. The algorithm which is equally applicable for pediatric and adult subjects, is implemented to facilitate user inspection and intervention. Objective features describing the fold dynamics, which are extracted from the edge displacement waveform are proposed and analyzed on a diverse dataset of healthy males, females and children. The sampling and quantization noise present in the recordings are analyzed and methods to mitigate them are investigated. A customized Kalman smoothing and spline interpolation on the displacement waveform is found to improve the feature estimation stability. The relationship between frame rate, spatial resolution and vibration for efficient capturing of information is derived. Finally, to address the inability to measure physical measurement, a structured light projection calibrated with respect to the endoscope is prototyped

    Vocal Fold Analysis From High Speed Videoendoscopic Data

    Get PDF
    High speed videoendoscopy (HSV) of the larynx far surpasses the limits of videostroboscopy in evaluating the vocal fold vibratory behavior by providing much higher frame rate. HSV enables the visualization of vocal fold vibratory pattern within an actual glottic cycle. This very detailed infor-mation on vocal fold vibratory characteristics could provide valuable information for the assessment of vocal fold vibratory function in disordered voices and the treatments effects of the behavioral, medical and surgical treatment procedures. In this work, we aim at addressing the problem of classi-fying voice disorders with varying etiology by following four steps described shortly. Our method-ology starts with glottis segmentation. Given a HSV data, the contour of the glottal opening area in each frame should be acquired. These contours record the vibration track of the vocal fold. After this, we obtain a reliable glottal axis that is necessary for getting certain vibratory features. The third step is the feature extraction on HSV data. In the last step, we complete the classification based on the features obtained from step 3. In this study, we first propose a novel glottis segmentation method based on simplified dynam-ic programming, which proves to be efficient and accurate. In addition, we introduce a new ap-proach for calculating the glottal axis. By comparing the proposed glottal axis determination meth-ods (modified linear regression) against state-of-the-art techniques, we demonstrate that our tech-nique is more reliable. After that, the concentration shifts to feature extraction and classification schemes. Eighteen different features are extracted and their discrimination is evaluated based on principal component analysis. Support vector machine and neural network are implemented to achieve the classification among three different types of vocal folds(normal vocal fold, unilateral vocal fold polyp, and unilateral vocal fold paralysis). The result demonstrates that the classification rates of four different tasks are all above 80%

    Automated tracking of quantitative parameters from single line scanning of vocal folds: A case study of the 'messa di voce' exercise

    Get PDF
    This article presents a novel application of the 'single line scanning' of the vocal fold vibrations (kymography) in singing pedagogy, particularly in a specific technical voice exercise: the 'messa di voce'. It aims at giving the singer relevant and valid short-term feedback. A user-friendly automatic analysis program makes possible a precise, immediate quantification of the essential physiological parameters characterizing the changes in glottal impedance, concomitant with the progressive increase and decrease of the lung pressure. The data provided by the program show a strong correlation with the hand-made measurements. Additional measurements such as subglottic pressure and flow glottography by inverse filtering can be meaningfully correlated with the data obtained from the kymographic images
    • …
    corecore