Search CORE

508 research outputs found

Analysis and Detection of Pathological Voice using Glottal Source Features

Author: Alku Paavo
Kadiri Sudarsana Reddy
Publication venue
Publication date: 25/09/2023
Field of study

Automatic detection of voice pathology enables objective assessment and earlier intervention for the diagnosis. This study provides a systematic analysis of glottal source features and investigates their effectiveness in voice pathology detection. Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method, using approximate glottal source signals computed with the zero frequency filtering (ZFF) method, and using acoustic voice signals directly. In addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF to effectively capture the variations in glottal source spectra of pathological voice. Experiments were carried out using two databases, the Hospital Universitario Principe de Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database. Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice. Pathology detection experiments were carried out using support vector machine (SVM). From the detection experiments it was observed that the performance achieved with the studied glottal source features is comparable or better than that of conventional MFCCs and perceptual linear prediction (PLP) features. The best detection performance was achieved when the glottal source features were combined with the conventional MFCCs and PLP features, which indicates the complementary nature of the features

arXiv.org e-Print Archive

Työvälineet äänilähteen analyysiin: päivitetty Aalto Aparat ja jatkuvan puheen sekä samanaikaisen elektroglottorafisignaalin tietokanta

Author: Pohjalainen Hilla
Publication venue
Publication date: 14/12/2015
Field of study

This thesis presents two tools for voice source analysis: updated Aalto Aparat inverse filtering programme, and a database of continuous Finnish speech and simultaneous electroglottography (EGG). A new glottal inverse filtering method, quasi closed phase glottal inverse filtering (QCP) has been implemented to Aalto Aparat, and usability of the programme has been improved. The results of the computations can now be transferred to other analysis programmes more efficiently. Also, a comprehensive manual of Aparat has been compiled. The database of continuous speech and EGG contains 20 recitations of a Finnish text by 10 male and 10 female native Finnish speakers. The recitations were recorded with a headset condense microphone and EGG electrodes. The recording sessions were performed in an anechoic chamber, and the full database contains almost an hour of material. The data can be used e.g. when evaluating new GIF methods.Tässä työssä esitetään kaksi työvälinettä äänilähteen mallintamiseen: päivitetty äänilähteen käänteissuodatusohjelma Aalto Aparat, sekä tietokanta jatkuvasta suomenkielisestä puheesta yhdessä elektroglottografisen (EGG) signaalin kanssa. Aalto Aparatiin lisättiin päivityksen yhteydessä yksi uusi käänteissuodatusmenetelmä, quasi closed phase inverse filtering (QCP), ja ohjelman käytettävyyttä parannettiin lisäämässä tuloksien tallennusvaihtoehtoja. Suodatustuloksia voi nyt siirtää entistä helpommin muihin analyysiohjelmiin. Lisäksi laadittiin kattava ohjekirja ohjelman käytöstä. Jatkuvan puheen ja EGG signaalin tietokanta sisältää 20 nauhoitetta, joissa lyhyt suomenkielinen tekstinäyte on luettu ääneen. Lukijoina oli 10 mies- ja 10 naispuolista suomenkielistä puhujaa. Ääneenluvut tallennettiin pantamikrofonin ja EGG elektrodien avulla. Äänitykset tehtiin kaiuttomassa huoneessa, ja kokonaisuudessaan tietokanta sisältää noin tunnin verran materiaalia, jota voidaan käyttää mm. uusien äänilähteen käänteissuodatusmenetelmien arvioimiseen

Aaltodoc Publication Archive

Estimation of glottal closure instants in voiced speech using the DYPSA algorithm

Author: Brookes M
Gudnason J
Kounoudes A
Naylor PA
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Published versio

Spiral - Imperial College Digital Repository

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

Author: Chen Yiqiao
Publication venue: Clemson University Libraries
Publication date: 01/05/2012
Field of study

The goal of this dissertation is to develop methods to recover glottal flow pulses, which contain biometrical information about the speaker. The excitation information estimated from an observed speech utterance is modeled as the source of an inverse problem. Windowed linear prediction analysis and inverse filtering are first used to deconvolve the speech signal to obtain a rough estimate of glottal flow pulses. Linear prediction and its inverse filtering can largely eliminate the vocal-tract response which is usually modeled as infinite impulse response filter. Some remaining vocal-tract components that reside in the estimate after inverse filtering are next removed by maximum-phase and minimum-phase decomposition which is implemented by applying the complex cepstrum to the initial estimate of the glottal pulses. The additive and residual errors from inverse filtering can be suppressed by higher-order statistics which is the method used to calculate cepstrum representations. Some features directly provided by the glottal source\u27s cepstrum representation as well as fitting parameters for estimated pulses are used to form feature patterns that were applied to a minimum-distance classifier to realize a speaker identification system with very limited subjects

Clemson University: TigerPrints

Parameterization of a computational physical model for glottal flow using inverse filtering and high-speed videoendoscopy

Author: Alku Paavo
Geneid Ahmed
Malinen Jarmo
Murtola Tiina
Publication venue
Publication date: 01/02/2018
Field of study

High-speed videoendoscopy, glottal inverse filtering, and physical modeling can be used to obtain complementary information about speech production. In this study, the three methodologies are combined to pursue a better understanding of the relationship between the glottal air flow and glottal area. Simultaneously acquired high-speed video and glottal inverse filtering data from three male and three female speakers were used. Significant correlations were found between the quasi-open and quasi-speed quotients of the glottal area (extracted from the high-speed videos) and glottal flow (estimated using glottal inverse filtering), but only the quasi-open quotient relationship could be represented as a linear model. A simple physical glottal flow model with three different glottal geometries was optimized to match the data. The results indicate that glottal flow skewing can be modeled using an inertial vocal/subglottal tract load and that estimated inertia within the glottis is sensitive to the quality of the data. Parameter optimisation also appears to favour combining the simplest glottal geometry with viscous losses and the more complex glottal geometries with entrance/exit effects in the glottis.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Time-Varying Modeling of Glottal Source and Vocal Tract and Sequential Bayesian Estimation of Model Parameters for Speech Synthesis

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Speech is generated by articulators acting on a phonatory source. Identification of this phonatory source and articulatory geometry are individually challenging and ill-posed problems, called speech separation and articulatory inversion, respectively. There exists a trade-off between decomposition and recovered articulatory geometry due to multiple possible mappings between an articulatory configuration and the speech produced. However, if measurements are obtained only from a microphone sensor, they lack any invasive insight and add additional challenge to an already difficult problem. A joint non-invasive estimation strategy that couples articulatory and phonatory knowledge would lead to better articulatory speech synthesis. In this thesis, a joint estimation strategy for speech separation and articulatory geometry recovery is studied. Unlike previous periodic/aperiodic decomposition methods that use stationary speech models within a frame, the proposed model presents a non-stationary speech decomposition method. A parametric glottal source model and an articulatory vocal tract response are represented in a dynamic state space formulation. The unknown parameters of the speech generation components are estimated using sequential Monte Carlo methods under some specific assumptions. The proposed approach is compared with other glottal inverse filtering methods, including iterative adaptive inverse filtering, state-space inverse filtering, and the quasi-closed phase method.Dissertation/ThesisMasters Thesis Electrical Engineering 201

ASU Digital Repository

COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY RECORDED HIGH-SPEED VIDEO FEATURES FOR CLINICALLY OBTAINED DATA

Author: Hamlet Sean Michael
Publication venue: UKnowledge
Publication date: 01/01/2012
Field of study

Accurate methods for glottal feature extraction include the use of high-speed video imaging (HSVI). There have been previous attempts to extract these features with the acoustic recording. However, none of these methods compare their results with an objective method, such as HSVI. This thesis tests these acoustic methods against a large diverse population of 46 subjects. Two previously studied acoustic methods, as well as one introduced in this thesis, were compared against two video methods, area and displacement for open quotient (OQ) estimation. The area comparison proved to be somewhat ambiguous and challenging due to thresholding eﬀects. The displacement comparison, which is based on glottal edge tracking, proved to be a more robust comparison method than the area. The ﬁrst acoustic methods OQ estimate had a relatively small average error of 8.90% and the second method had a relatively large average error of -59.05% compared to the displacement OQ. The newly proposed method had a relatively small error of -13.75% when compared to the displacements OQ. There was some success even though there was relatively high error with the acoustic methods, however, they may be utilized to augment the features collected by HSVI for a more accurate glottal feature estimation

University of Kentucky