581 research outputs found

    Examining the neoglottal vibratory pattern of Cantonese tracheoesophageal speakers : a preliminary aerodynamic study using inverse-filtering

    Get PDF
    The present study examined the neoglottal vibratory pattern of Cantonese tracheoesophageal (TE) speakers by inverse-filtering the airflow signals obtained from eight superior TE speakers during phonation. The syllable /papapa/ was used for obtaining airflow signals, and the acoustic signals of the vowels /i, æ, a, ɔ, u/ were also obtained. Aerodynamic parameters obtained were compared between TE and laryngeal speakers. Results revealed that TE speakers exhibited comparable open quotient and airflow volume values but significantly smaller speed quotient values than laryngeal speakers. The marked difference in inverse-filtered airflow signals between TE and laryngeal speech of Cantonese is believed to be related to the use of different sounding mechanisms between the two speaking methods, and the unique vibratory nature of the neoglottis in TE speech.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Alternating minimisation for glottal inverse filtering

    Get PDF
    A new method is proposed for solving the glottal inverse filtering (GIF) problem. The goal of GIF is to separate an acoustical speech signal into two parts: the glottal airflow excitation and the vocal tract filter. To recover such information one has to deal with a blind deconvolution problem. This ill-posed inverse problem is solved under a deterministic setting, considering unknowns on both sides of the underlying operator equation. A stable reconstruction is obtained using a double regularization strategy, alternating between fixing either the glottal source signal or the vocal tract filter. This enables not only splitting the nonlinear and nonconvex problem into two linear and convex problems, but also allows the use of the best parameters and constraints to recover each variable at a time. This new technique, called alternating minimization glottal inverse filtering (AM-GIF), is compared with two other approaches: Markov chain Monte Carlo glottal inverse filtering (MCMC-GIF), and iterative adaptive inverse filtering (IAIF), using synthetic speech signals. The recent MCMC-GIF has good reconstruction quality but high computational cost. The state-of-the-art IAIF method is computationally fast but its accuracy deteriorates, particularly for speech signals of high fundamental frequency (F0). The results show the competitive performance of the new method: With high F0, the reconstruction quality is better than that of IAIF and close to MCMC-GIF while reducing the computational complexity by two orders of magnitude.Peer reviewe

    COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY RECORDED HIGH-SPEED VIDEO FEATURES FOR CLINICALLY OBTAINED DATA

    Get PDF
    Accurate methods for glottal feature extraction include the use of high-speed video imaging (HSVI). There have been previous attempts to extract these features with the acoustic recording. However, none of these methods compare their results with an objective method, such as HSVI. This thesis tests these acoustic methods against a large diverse population of 46 subjects. Two previously studied acoustic methods, as well as one introduced in this thesis, were compared against two video methods, area and displacement for open quotient (OQ) estimation. The area comparison proved to be somewhat ambiguous and challenging due to thresholding effects. The displacement comparison, which is based on glottal edge tracking, proved to be a more robust comparison method than the area. The first acoustic methods OQ estimate had a relatively small average error of 8.90% and the second method had a relatively large average error of -59.05% compared to the displacement OQ. The newly proposed method had a relatively small error of -13.75% when compared to the displacements OQ. There was some success even though there was relatively high error with the acoustic methods, however, they may be utilized to augment the features collected by HSVI for a more accurate glottal feature estimation

    A comparison of two methods of formant frequency estimation for high-pitched voices

    Get PDF
    This study sought to test the accuracy of two methods of formant frequency estimation: artificial laryngeal stimulation via neck placement and via oral tube insertion. Twenty males between the ages of 18 and 45 performed the following three tasks: (1) four seconds of sustained vowel, (2) two seconds of sustained vowel followed by two seconds of artificial laryngeal stimulation via neck placement while ceasing vocal fold vibration and holding structures of the vocal fold filter in a fixed position, and (3) four seconds of sustained vowel, the last two of which were accompanied by artificial laryngeal stimulation via an oral insertion. These tasks were performed on the vowels/a/ and /i/. Four formant frequencies were measured for each task at second one and second three. These measures were compared across second one and second three, as well as across all three tasks. Group means as well as individual subject analysis were compared

    Glottal flow characteristics in vowels produced by speakers with heart failure

    Get PDF
    Heart failure (HF) is one of the most life-threatening diseases globally. HF is an under-diagnosed condition, and more screening tools are needed to detect it. A few recent studies have suggested that HF also affects the functioning of the speech production mechanism by causing generation of edema in the vocal folds and by impairing the lung function. It has not yet been studied whether these possible effects of HF on the speech production mechanism are large enough to cause acoustically measurable differences to distinguish speech produced in HF from that produced by healthy speakers. Therefore, the goal of the present study was to compare speech production between HF patients and healthy controls by focusing on the excitation signal generated at the level of the vocal folds, the glottal flow. The glottal flow was computed from speech using the quasi-closed phase glottal inverse filtering method and the estimated flow was parameterized with 12 glottal parameters. The sound pressure level (SPL) was measured from speech as an additional parameter. The statistical analyses conducted on the parameters indicated that most of the glottal parameters and SPL were significantly different between the HF patients and healthy controls. The results showed that the HF patients generally produced a more rounded glottal pulse and a lower SPL level compared to the healthy controls, indicating incomplete glottal closure and inappropriate leakage of air through the glottis. The results observed in this preliminary study indicate that glottal features are capable of distinguishing speakers with HF from healthy controls. Therefore, the study suggests that glottal features constitute a potential feature extraction approach which should be taken into account in future large-scale investigations in studying the automatic detection of HF from speech.Peer reviewe

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Parameterization of a computational physical model for glottal flow using inverse filtering and high-speed videoendoscopy

    Get PDF
    High-speed videoendoscopy, glottal inverse filtering, and physical modeling can be used to obtain complementary information about speech production. In this study, the three methodologies are combined to pursue a better understanding of the relationship between the glottal air flow and glottal area. Simultaneously acquired high-speed video and glottal inverse filtering data from three male and three female speakers were used. Significant correlations were found between the quasi-open and quasi-speed quotients of the glottal area (extracted from the high-speed videos) and glottal flow (estimated using glottal inverse filtering), but only the quasi-open quotient relationship could be represented as a linear model. A simple physical glottal flow model with three different glottal geometries was optimized to match the data. The results indicate that glottal flow skewing can be modeled using an inertial vocal/subglottal tract load and that estimated inertia within the glottis is sensitive to the quality of the data. Parameter optimisation also appears to favour combining the simplest glottal geometry with viscous losses and the more complex glottal geometries with entrance/exit effects in the glottis.Peer reviewe
    corecore