Search CORE

11,119 research outputs found

Automatic age detection in normal and pathological voice

Author: Castellanos Domínguez Germán
Godino Llorente Juan Ignacio
Gómez García Jorge Andrés
Moro Velázquez Laureano
Publication venue: E.T.S.I y Sistemas de Telecomunicación (UPM)
Publication date: 01/01/2015
Field of study

Systems that automatically detect voice pathologies are usually trained with recordings belonging to population of all ages. However such an approach might be inadequate because of the acoustic variations in the voice caused by the natural aging process. In top of that, elder voices present some perturbations in quality similar to those related to voice disorders, which make the detection of pathologies more troublesome. With this in mind, the study of methodologies which automatically incorporate information about speakers’ age, aiming at a simplification in the detection of voice disorders is of interest. In this respect, the present paper introduces an age detector trained with normal and pathological voice, constituting a first step towards the study of age-dependent pathology detectors. The proposed system employs sustained vowels of the Saarbrucken database from which two age groups are examinated: adults and elders. Mel frequency cepstral coefficients for characterization, and Gaussian mixture models for classification are utilized. In addition, fusion of vowels at score level is considered to improve detection performance. Results suggest that age might be effectively recognized using normal and pathological voices when using sustained vowels as acoustical material, opening up possibilities for the design of automatic age-dependent voice pathology detection systems

Archivo Digital UPM

Automatic Detection of Laryngeal Pathology on Sustained Vowels Using Short-Term Cepstral Parameters: Analysis of Performance and Theoretical Justification

Author: B. Boyanov
B. Boyanov
J.G. Proakis
J.I. Godino-Llorente
J.I. Godino-Llorente
J.I. Godino-Llorente
J.R. Deller
L. Rabiner
P.J. Murphy
R.O. Duda
S. Haykin
S.E. Bou-Ghazale
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The majority of speech signal analysis procedures for automatic detection of laryngeal pathologies mainly rely on parameters extracted from time domain processing. Moreover, calculation of these parameters often requires prior pitch period estimation; therefore, their validity heavily depends on the robustness of pitch detection. Within this paper, an alternative approach based on cepstral- domain processing is presented which has the advantage of not requiring pitch estimation, thus providing a gain in both simplicity and robustness. While the proposed scheme is similar to solutions based on Mel-frequency cepstral parameters, already present in literature, it has an easier physical interpretation while achieving similar performance standards

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Estimation of Severity of Speech Disability through Speech Envelope

Author: Gudi Anandthirtha B.
Nagaraj H. C.
Shreedhar H. K.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 21/07/2011
Field of study

In this paper, envelope detection of speech is discussed to distinguish the pathological cases of speech disabled children. The speech signal samples of children of age between five to eight years are considered for the present study. These speech signals are digitized and are used to determine the speech envelope. The envelope is subjected to ratio mean analysis to estimate the disability. This analysis is conducted on ten speech signal samples which are related to both place of articulation and manner of articulation. Overall speech disability of a pathological subject is estimated based on the results of above analysis.Comment: 8 pages,4 Figures,Signal & Image Processing Journal AIRC

arXiv.org e-Print Archive

Crossref

Use of Mel Frequency Cepstral Coefficients for Automatic Pathology Detection on Sustained Vowel Phonations: Mathematical and Statistical Justification

Author: Fraile Muñoz Rubén
Godino Llorente Juan Ignacio
Gómez Vilda Pedro
Osma Ruiz Víctor
Sáenz Lechón Nicolas
Publication venue: E.U.I.T. Telecomunicación (UPM)
Publication date: 01/01/2008
Field of study

This paper presents a justification for the use of MFCC parameters in automatic pathology detection on speech. While such an application has produced good results up to now, only partial explanations to this good performance had been given before. The herein exposed explanation consists of an interpretation of the mathematical transformations involved in MFCC calculation and a statistical analysis that confirms the conclusions drawn from the theoretical reasoning

Archivo Digital UPM

Glottal-Source Spectral Biometry for Voice Characterization

Author: Fernández-Baillo Gallego de la Sacristana Roberto
Gómez Vilda Pedro
Martínez Olalla Rafael
Mazaira Fernández Luis Miguel
Muñoz Cristina
Rodellar Biarge M. Victoria
Álvarez Marquina Agustin
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2008
Field of study

The biometric signature derived from the estimation of the power spectral density singularities of a speaker’s glottal source is described in the present work. This consists in the collection of peak-trough profiles found in the spectral density, as related to the biomechanics of the vocal folds. Samples of parameter estimations from a set of 100 normophonic (pathology-free) speakers are produced. Mapping the set of speaker’s samples to a manifold defined by Principal Component Analysis and clustering them by k-means in terms of the most relevant principal components shows the separation of speakers by gender. This means that the proposed signature conveys relevant speaker’s metainformation, which may be useful in security and forensic applications for which contextual side information is considered relevant

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Archivo Digital UPM

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

Author: Franco Horacio
Mitra Vikramjit
Sivaraman Ganesh
Yılmaz Emre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

arXiv.org e-Print Archive

Radboud Repository

ScholarBank@NUS

Development of the Arabic Voice Pathology Database and Its Evaluation by Using Speech Features and Machine Learning Algorithms

Author: Al-nasheri Ahmed
Ali Zulfiqar
Alsulaiman Mansour
Farahat Mohamed
Malki Khalid H
Mesallam Tamer A
Muhammad Ghulam
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

A voice disorder database is an essential element in doing research on automatic voice disorder detection and classification. Ethnicity affects the voice characteristics of a person, and so it is necessary to develop a database by collecting the voice samples of the targeted ethnic group. This will enhance the chances of arriving at a global solution for the accurate and reliable diagnosis of voice disorders by understanding the characteristics of a local group. Motivated by such idea, an Arabic voice pathology database (AVPD) is designed and developed in this study by recording three vowels, running speech, and isolated words. For each recorded samples, the perceptual severity is also provided which is a unique aspect of the AVPD. During the development of the AVPD, the shortcomings of different voice disorder databases were identified so that they could be avoided in the AVPD. In addition, the AVPD is evaluated by using six different types of speech features and four types of machine learning algorithms. The results of detection and classification of voice disorders obtained with the sustained vowel and the running speech are also compared with the results of an English-language disorder database, the Massachusetts Eye and Ear Infirmary (MEEI) database

University of Essex Research Repository

Crossref

Directory of Open Access Journals

Ulster University's Research Portal

Assessment of severe apnoea through voice analysis, automatic speech, and speaker recognition techniques

Author: Alcázar Ramírez José
Blanco José Luis
Fernández Pozo Rubén
Hernández Gómez Luis
López Gonzalo Eduardo
Toledano Doroteo T.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2009
Field of study

The electronic version of this article is the complete one and can be found online at: http://asp.eurasipjournals.com/content/2009/1/982531This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.The activities described in this paper were funded by the Spanish Ministry of Science and Technology as part of the TEC2006-13170-C02-02 Project

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Fondo Bibliográfico Digital Institucional

Biblos-e Archivo