581 research outputs found
Examining the neoglottal vibratory pattern of Cantonese tracheoesophageal speakers : a preliminary aerodynamic study using inverse-filtering
The present study examined the neoglottal vibratory pattern of Cantonese tracheoesophageal (TE) speakers by inverse-filtering the airflow signals obtained from eight superior TE speakers during phonation. The syllable /papapa/ was used for obtaining airflow signals, and the acoustic signals of the vowels /i, æ, a, ɔ, u/ were also obtained. Aerodynamic parameters obtained were compared between TE and laryngeal speakers. Results revealed that TE speakers exhibited comparable open quotient and airflow volume values but significantly smaller speed quotient values than laryngeal speakers. The marked difference in inverse-filtered airflow signals between TE and laryngeal speech of Cantonese is believed to be related to the use of different sounding mechanisms between the two speaking methods, and the unique vibratory nature of the neoglottis in TE speech.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science
Recommended from our members
A novel framework for high-quality voice source analysis and synthesis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified
speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate
Alternating minimisation for glottal inverse filtering
A new method is proposed for solving the glottal inverse filtering (GIF) problem. The goal of GIF is to separate an acoustical speech signal into two parts: the glottal airflow excitation and the vocal tract filter. To recover such information one has to deal with a blind deconvolution problem. This ill-posed inverse problem is solved under a deterministic setting, considering unknowns on both sides of the underlying operator equation. A stable reconstruction is obtained using a double regularization strategy, alternating between fixing either the glottal source signal or the vocal tract filter. This enables not only splitting the nonlinear and nonconvex problem into two linear and convex problems, but also allows the use of the best parameters and constraints to recover each variable at a time. This new technique, called alternating minimization glottal inverse filtering (AM-GIF), is compared with two other approaches: Markov chain Monte Carlo glottal inverse filtering (MCMC-GIF), and iterative adaptive inverse filtering (IAIF), using synthetic speech signals. The recent MCMC-GIF has good reconstruction quality but high computational cost. The state-of-the-art IAIF method is computationally fast but its accuracy deteriorates, particularly for speech signals of high fundamental frequency (F0). The results show the competitive performance of the new method: With high F0, the reconstruction quality is better than that of IAIF and close to MCMC-GIF while reducing the computational complexity by two orders of magnitude.Peer reviewe
COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY RECORDED HIGH-SPEED VIDEO FEATURES FOR CLINICALLY OBTAINED DATA
Accurate methods for glottal feature extraction include the use of high-speed video imaging (HSVI). There have been previous attempts to extract these features with the acoustic recording. However, none of these methods compare their results with an objective method, such as HSVI. This thesis tests these acoustic methods against a large diverse population of 46 subjects. Two previously studied acoustic methods, as well as one introduced in this thesis, were compared against two video methods, area and displacement for open quotient (OQ) estimation. The area comparison proved to be somewhat ambiguous and challenging due to thresholding effects. The displacement comparison, which is based on glottal edge tracking, proved to be a more robust comparison method than the area. The first acoustic methods OQ estimate had a relatively small average error of 8.90% and the second method had a relatively large average error of -59.05% compared to the displacement OQ. The newly proposed method had a relatively small error of -13.75% when compared to the displacements OQ. There was some success even though there was relatively high error with the acoustic methods, however, they may be utilized to augment the features collected by HSVI for a more accurate glottal feature estimation
A comparison of two methods of formant frequency estimation for high-pitched voices
This study sought to test the accuracy of two methods of formant frequency estimation: artificial laryngeal stimulation via neck placement and via oral tube insertion. Twenty males between the ages of 18 and 45 performed the following three tasks: (1) four seconds of sustained vowel, (2) two seconds of sustained vowel followed by two seconds of artificial laryngeal stimulation via neck placement while ceasing vocal fold vibration and holding structures of the vocal fold filter in a fixed position, and (3) four seconds of sustained vowel, the last two of which were accompanied by artificial laryngeal stimulation via an oral insertion. These tasks were performed on the vowels/a/ and /i/. Four formant frequencies were measured for each task at second one and second three. These measures were compared across second one and second three, as well as across all three tasks. Group means as well as individual subject analysis were compared
Glottal flow characteristics in vowels produced by speakers with heart failure
Heart failure (HF) is one of the most life-threatening diseases globally. HF is an under-diagnosed condition, and more screening tools are needed to detect it. A few recent studies have suggested that HF also affects the functioning of the speech production mechanism by causing generation of edema in the vocal folds and by impairing the lung function. It has not yet been studied whether these possible effects of HF on the speech production mechanism are large enough to cause acoustically measurable differences to distinguish speech produced in HF from that produced by healthy speakers. Therefore, the goal of the present study was to compare speech production between HF patients and healthy controls by focusing on the excitation signal generated at the level of the vocal folds, the glottal flow. The glottal flow was computed from speech using the quasi-closed phase glottal inverse filtering method and the estimated flow was parameterized with 12 glottal parameters. The sound pressure level (SPL) was measured from speech as an additional parameter. The statistical analyses conducted on the parameters indicated that most of the glottal parameters and SPL were significantly different between the HF patients and healthy controls. The results showed that the HF patients generally produced a more rounded glottal pulse and a lower SPL level compared to the healthy controls, indicating incomplete glottal closure and inappropriate leakage of air through the glottis. The results observed in this preliminary study indicate that glottal features are capable of distinguishing speakers with HF from healthy controls. Therefore, the study suggests that glottal features constitute a potential feature extraction approach which should be taken into account in future large-scale investigations in studying the automatic detection of HF from speech.Peer reviewe
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec
Parameterization of a computational physical model for glottal flow using inverse filtering and high-speed videoendoscopy
High-speed videoendoscopy, glottal inverse filtering, and physical modeling can be used to obtain complementary information about speech production. In this study, the three methodologies are combined to pursue a better understanding of the relationship between the glottal air flow and glottal area. Simultaneously acquired high-speed video and glottal inverse filtering data from three male and three female speakers were used. Significant correlations were found between the quasi-open and quasi-speed quotients of the glottal area (extracted from the high-speed videos) and glottal flow (estimated using glottal inverse filtering), but only the quasi-open quotient relationship could be represented as a linear model. A simple physical glottal flow model with three different glottal geometries was optimized to match the data. The results indicate that glottal flow skewing can be modeled using an inertial vocal/subglottal tract load and that estimated inertia within the glottis is sensitive to the quality of the data. Parameter optimisation also appears to favour combining the simplest glottal geometry with viscous losses and the more complex glottal geometries with entrance/exit effects in the glottis.Peer reviewe
- …