10 research outputs found

    Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network

    Get PDF
    The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future

    Glottal opening and closing events investigated by electroglottography and super-high-speed video recordings

    No full text
    International audiencePrevious research has suggested that the peaks in the first derivative (dEGG) of the electroglottographic (EGG) signal are good approximate indicators of the events of glottal opening and closing. These findings were based on high-speed video (HSV) recordings with frame rates 10 times lower than the sampling frequencies of the corresponding EGG data. The present study attempts to corroborate these previous findings, utilizing super-HSV recordings. The HSV and EGG recordings (sampled at 27 and 44 kHz, respectively) of an excised canine larynx phonation were synchronized by an external TTL signal to within 0.037 ms. Data were analyzed by means of glottovibrograms, digital kymograms, the glottal area waveform and the vocal fold contact length (VFCL), a new parameter representing the time-varying degree of 'zippering' closure along the anterior-posterior (A-P) glottal axis. The temporal offsets between glottal events (depicted in the HSV recordings) and dEGG peaks in the opening and closing phase of glottal vibration ranged from 0.02 to 0.61 ms, amounting to 0.24-10.88% of the respective glottal cycle durations. All dEGG double peaks coincided with vibratory A-P phase differences. In two out of the three analyzed video sequences, peaks in the first derivative of the VFCL coincided with dEGG peaks, again co-occurring with A-P phase differences. The findings suggest that dEGG peaks do not always coincide with the events of glottal closure and initial opening. Vocal fold contacting and de-contacting do not occur at infinitesimally small instants of time, but extend over a certain interval, particularly under the influence of A-P phase differences

    Voice handicap of laryngectomees with tracheoesophageal speech

    Get PDF
    The evaluation of diagnostics and therapies includes more and more subjective, i.e. emotional and social aspects. Focussing on the handicap experienced by dysphonic patients, the Voice Handicap Index (VHI) has previously been found to be of significant clinical and scientific value for different voices. In this study the VHI questionnaire was applied to demonstrate the voice handicap of 20 male laryngectomees using tracheoesophageal voice (Provox®), aged 65.5 B 8.7 years. Their VHI was 45.5 B 24.1, which was significantly higher than the score of patients with functional voice disorders, but differed only slightly from patients with organic laryngeal dysphonia. Focussing on individual data, VHI scores ranged from values similar to persons without voice disorder to maximum handicap of 101. Comparing the VHI scores with the laryngectomees’ gradual self-perception of voice disorder severity, no consistent relationship was found. Considering the large interindividual differences, the VHI may serve as a valuable instrument for the assessment of individual interventional needs rather than for the identification of a general laryngectomees’ handicap

    Assessment of Electrode Displacement and Deformation with Respect to Pre-Operative Planning in Deep Brain Stimulation

    No full text
    The post-operative validation of deep brain stimulation electrode displacement and deformation is an important task towards improved DBS targeting. In this paper a method is proposed to align models of deep brain stimulation electrodes that are automatically extracted from post-operative CT imaging in a common coordinate system utilizing the planning data as reference. This enables the assessment of electrode displacement and deformation over the whole length of the trajectory with respect to the pre-operative planning. Accordingly, it enables the estimation of plan deviations in the surgical process as well as cross-patient statistics on electrode deformation, e.g. the bending induced by brain-shift

    The Pitch Rise Paradigm: A New Task for Real-Time Endoscopy of Non-Stationary Phonation

    No full text
    As standard stroboscopy is restricted to the recording of periodic vocal fold vibrations, observations of non-stationary laryngeal mechanisms demand real-time recording systems, the most advanced being the high-speed video technique. It allows the registration of laryngeal parameters during a variation of the fundamental frequency. The aim of this study was to compare amplitude and frequency parameters of vocal fold vibration during stationary and non-stationary phonation, i.e. a monotonous pitch rise. Twenty-nine young female adults with no incidence of voice disorders were examined while performing two diff erent phonation tasks: sustained phonation with a constant frequency and a monotonous pitch rise. Endoscopic recordings and the acoustic signals were acquired simultaneously. Both acoustic and laryngeal parameters were derived for short time intervals of 17.8 ms for the constant pitch and pitch rise conditions. Instantaneous frequency, sound pressure level, vibratory amplitudes of the vocal folds and the type of glottal closure were compared. At the beginning of the pitch rise, the acoustic and laryngeal parameters were similar to the parameters that occurred within the sustained phonation conditions. In contrast, the laryngeal parameters at the middle and at the end of the pitch rise diff ered substantially from those during sustained phonation. For the fi rst time, quantitative measures of the growing glottal chink and the vibration amplitude decrease during pitch increase could be taken. In general, the image evaluation of the pitch rise paradigm can be subdivided into the starting, the raising and the final phase. As each phase can be considered as quasi-stationary, existing software modules are capable of analysing the process by treating each phase separately. Hence, the pitch rise condition may be suitable for clinical examination to detect information of voice disturbances that cannot be visualized during sustained phonation

    Correlation between Psychometric Tests and Mismatch Negativity in Preschool Children

    No full text
    The objective was to determine whether mismatch negativity (MMN) is suitable to supplement subjective psychometric subtests of central hearing. We assessed 13 healthy children and 32 children with central auditory processing disorder (CAPD). Three different types of sound deviants were presented in a multi-deviant MMN design. At group level, the incidence of MMN was always higher in clinically diagnosed controls. Children with better results in the subtest Auditory Memory Span had a higher incidence of MMN. The controls also had peak latencies that occurred significantly earlier in frontal, central and temporal electrode sites. The area under the curve (AUC) displayed an asymmetric distribution in CAPD children, who tended to have a left-hemispheric dominance. AUC, peak latency, and the incidence of MMN reflected the discriminative ability of CAPD children. Hence, these characteristics could be used for investigating children with deficits in central hearing and can supplement psychometric tests
    corecore