508 research outputs found
Analysis and Detection of Pathological Voice using Glottal Source Features
Automatic detection of voice pathology enables objective assessment and
earlier intervention for the diagnosis. This study provides a systematic
analysis of glottal source features and investigates their effectiveness in
voice pathology detection. Glottal source features are extracted using glottal
flows estimated with the quasi-closed phase (QCP) glottal inverse filtering
method, using approximate glottal source signals computed with the zero
frequency filtering (ZFF) method, and using acoustic voice signals directly. In
addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from
the glottal source waveforms computed by QCP and ZFF to effectively capture the
variations in glottal source spectra of pathological voice. Experiments were
carried out using two databases, the Hospital Universitario Principe de
Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database.
Analysis of features revealed that the glottal source contains information that
discriminates normal and pathological voice. Pathology detection experiments
were carried out using support vector machine (SVM). From the detection
experiments it was observed that the performance achieved with the studied
glottal source features is comparable or better than that of conventional MFCCs
and perceptual linear prediction (PLP) features. The best detection performance
was achieved when the glottal source features were combined with the
conventional MFCCs and PLP features, which indicates the complementary nature
of the features
TyövÀlineet ÀÀnilÀhteen analyysiin: pÀivitetty Aalto Aparat ja jatkuvan puheen sekÀ samanaikaisen elektroglottorafisignaalin tietokanta
This thesis presents two tools for voice source analysis: updated Aalto Aparat inverse filtering programme, and a database of continuous Finnish speech and simultaneous electroglottography (EGG). A new glottal inverse filtering method, quasi closed phase glottal inverse filtering (QCP) has been implemented to Aalto Aparat, and usability of the programme has been improved. The results of the computations can now be transferred to other analysis programmes more efficiently.
Also, a comprehensive manual of Aparat has been compiled. The database of continuous speech and EGG contains 20 recitations of a Finnish text by 10 male and 10 female native Finnish speakers. The recitations were recorded with a headset condense microphone and EGG electrodes. The recording sessions were performed in an anechoic chamber, and the full database contains almost an hour of material. The data can be used e.g. when evaluating new GIF methods.TÀssÀ työssÀ esitetÀÀn kaksi työvÀlinettÀ ÀÀnilÀhteen mallintamiseen: pÀivitetty ÀÀnilÀhteen kÀÀnteissuodatusohjelma Aalto Aparat, sekÀ tietokanta jatkuvasta suomenkielisestÀ puheesta yhdessÀ elektroglottografisen (EGG) signaalin kanssa. Aalto Aparatiin lisÀttiin pÀivityksen yhteydessÀ yksi uusi kÀÀnteissuodatusmenetelmÀ, quasi closed phase inverse filtering (QCP), ja ohjelman kÀytettÀvyyttÀ parannettiin lisÀÀmÀssÀ tuloksien tallennusvaihtoehtoja. Suodatustuloksia voi nyt siirtÀÀ entistÀ helpommin muihin analyysiohjelmiin. LisÀksi laadittiin kattava ohjekirja ohjelman kÀytöstÀ.
Jatkuvan puheen ja EGG signaalin tietokanta sisĂ€ltÀÀ 20 nauhoitetta, joissa lyhyt suomenkielinen tekstinĂ€yte on luettu ÀÀneen. Lukijoina oli 10 mies- ja 10 naispuolista suomenkielistĂ€ puhujaa. ĂĂ€neenluvut tallennettiin pantamikrofonin ja EGG elektrodien avulla. ĂĂ€nitykset tehtiin kaiuttomassa huoneessa, ja kokonaisuudessaan tietokanta sisĂ€ltÀÀ noin tunnin verran materiaalia, jota voidaan kĂ€yttÀÀ mm. uusien ÀÀnilĂ€hteen kÀÀnteissuodatusmenetelmien arvioimiseen
GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES
The goal of this dissertation is to develop methods to recover glottal flow pulses, which contain biometrical information about the speaker. The excitation information estimated from an observed speech utterance is modeled as the source of an inverse problem. Windowed linear prediction analysis and inverse filtering are first used to deconvolve the speech signal to obtain a rough estimate of glottal flow pulses. Linear prediction and its inverse filtering can largely eliminate the vocal-tract response which is usually modeled as infinite impulse response filter. Some remaining vocal-tract components that reside in the estimate after inverse filtering are next removed by maximum-phase and minimum-phase decomposition which is implemented by applying the complex cepstrum to the initial estimate of the glottal pulses. The additive and residual errors from inverse filtering can be suppressed by higher-order statistics which is the method used to calculate cepstrum representations. Some features directly provided by the glottal source\u27s cepstrum representation as well as fitting parameters for estimated pulses are used to form feature patterns that were applied to a minimum-distance classifier to realize a speaker identification system with very limited subjects
Parameterization of a computational physical model for glottal flow using inverse filtering and high-speed videoendoscopy
High-speed videoendoscopy, glottal inverse filtering, and physical modeling can be used to obtain complementary information about speech production. In this study, the three methodologies are combined to pursue a better understanding of the relationship between the glottal air flow and glottal area. Simultaneously acquired high-speed video and glottal inverse filtering data from three male and three female speakers were used. Significant correlations were found between the quasi-open and quasi-speed quotients of the glottal area (extracted from the high-speed videos) and glottal flow (estimated using glottal inverse filtering), but only the quasi-open quotient relationship could be represented as a linear model. A simple physical glottal flow model with three different glottal geometries was optimized to match the data. The results indicate that glottal flow skewing can be modeled using an inertial vocal/subglottal tract load and that estimated inertia within the glottis is sensitive to the quality of the data. Parameter optimisation also appears to favour combining the simplest glottal geometry with viscous losses and the more complex glottal geometries with entrance/exit effects in the glottis.Peer reviewe
Time-Varying Modeling of Glottal Source and Vocal Tract and Sequential Bayesian Estimation of Model Parameters for Speech Synthesis
abstract: Speech is generated by articulators acting on
a phonatory source. Identification of this
phonatory source and articulatory geometry are
individually challenging and ill-posed
problems, called speech separation and
articulatory inversion, respectively.
There exists a trade-off
between decomposition and recovered
articulatory geometry due to multiple
possible mappings between an
articulatory configuration
and the speech produced. However, if measurements
are obtained only from a microphone sensor,
they lack any invasive insight and add
additional challenge to an already difficult
problem.
A joint non-invasive estimation
strategy that couples articulatory and
phonatory knowledge would lead to better
articulatory speech synthesis. In this thesis,
a joint estimation strategy for speech
separation and articulatory geometry recovery
is studied. Unlike previous
periodic/aperiodic decomposition methods that
use stationary speech models within a
frame, the proposed model presents a
non-stationary speech decomposition method.
A parametric glottal source model and an
articulatory vocal tract response are
represented in a dynamic state space formulation.
The unknown parameters of the
speech generation components are estimated
using sequential Monte Carlo methods
under some specific assumptions.
The proposed approach is compared with other
glottal inverse filtering methods,
including iterative adaptive inverse filtering,
state-space inverse filtering, and
the quasi-closed phase method.Dissertation/ThesisMasters Thesis Electrical Engineering 201
COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY RECORDED HIGH-SPEED VIDEO FEATURES FOR CLINICALLY OBTAINED DATA
Accurate methods for glottal feature extraction include the use of high-speed video imaging (HSVI). There have been previous attempts to extract these features with the acoustic recording. However, none of these methods compare their results with an objective method, such as HSVI. This thesis tests these acoustic methods against a large diverse population of 46 subjects. Two previously studied acoustic methods, as well as one introduced in this thesis, were compared against two video methods, area and displacement for open quotient (OQ) estimation. The area comparison proved to be somewhat ambiguous and challenging due to thresholding eïŹects. The displacement comparison, which is based on glottal edge tracking, proved to be a more robust comparison method than the area. The ïŹrst acoustic methods OQ estimate had a relatively small average error of 8.90% and the second method had a relatively large average error of -59.05% compared to the displacement OQ. The newly proposed method had a relatively small error of -13.75% when compared to the displacements OQ. There was some success even though there was relatively high error with the acoustic methods, however, they may be utilized to augment the features collected by HSVI for a more accurate glottal feature estimation
- âŠ