592 research outputs found
Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference
Analysis and Detection of Pathological Voice using Glottal Source Features
Automatic detection of voice pathology enables objective assessment and
earlier intervention for the diagnosis. This study provides a systematic
analysis of glottal source features and investigates their effectiveness in
voice pathology detection. Glottal source features are extracted using glottal
flows estimated with the quasi-closed phase (QCP) glottal inverse filtering
method, using approximate glottal source signals computed with the zero
frequency filtering (ZFF) method, and using acoustic voice signals directly. In
addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from
the glottal source waveforms computed by QCP and ZFF to effectively capture the
variations in glottal source spectra of pathological voice. Experiments were
carried out using two databases, the Hospital Universitario Principe de
Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database.
Analysis of features revealed that the glottal source contains information that
discriminates normal and pathological voice. Pathology detection experiments
were carried out using support vector machine (SVM). From the detection
experiments it was observed that the performance achieved with the studied
glottal source features is comparable or better than that of conventional MFCCs
and perceptual linear prediction (PLP) features. The best detection performance
was achieved when the glottal source features were combined with the
conventional MFCCs and PLP features, which indicates the complementary nature
of the features
Identification of voice pathologies in an elderly population
Ageing is associated with an increased risk of developing diseases, including a greater pre-
disposition to develop diseases such as Sepsis. Also, with ageing, human voices undergo a
natural degradation gauged by alterations in hoarseness, breathiness, articulatory ability,
and speaking rate. Nowadays, perceptual evaluation is widely used to assess speech and
voice impairments despite its high subjectivity.
This dissertation proposes a new method for detecting and identifying voice patholo-
gies by exploring acoustic parameters of continuous speech signals in the elderly popula-
tion. Additionally, a study of the influence of gender and age on voice pathology detection
systems’ performance is conducted.
The study included 44 subjects older than 60 years old, with the pathologies Dyspho-
nia, Functional Dysphonia, and Spasmodic Dysphonia. In the dataset originated with
these settings, two gender-dependent subsets were created, one with only female samples
and the other with only male samples. The system developed used three feature selection
methods and five Machine Learning algorithms to classify the voice signal according to
the presence of pathology.
The binary classification, which consisted of voice pathology detection, reached an
accuracy of 85,1%±5,1% for the dataset without gender division, 83,7%±7,0% for the
male dataset, and 87,4%±4,2% for the female dataset. As for the multiclass classifica-
tion, which consisted of the classification of different pathologies, reached an accuracy of
69,0%±5,1% for the dataset without gender division, 63,7%± 5,4% for the male dataset,
and 80,6%±8,1% for the female dataset.
The obtained results revealed that features that describe fluency are important and
discriminating in these types of systems. Also, Random Forest has shown to be the most
effective Machine Learning algorithm for both binary and multiclass classification.
The proposed model proves to be promising in detecting pathological voices and
identifying the underlying pathology in an elderly population, with an increase in its
performance when a gender division is performed.O envelhecimento está associado a um maior risco de desenvolvimento de doenças, nome-
adamente a uma maior predisposição para a evolução de doenças como a Sepsis. Inclusiva-
mente, com o envelhecimento, a voz sofre uma degradação natural aferindo-se alterações
na rouquidão, respiração, capacidade articulatória e no ritmo do discurso. Atualmente, a
avaliação percetual é amplamente utilizada para avaliar as perturbações da fala e da voz,
possuindo elevada subjetividade.
Esta dissertação propõe um novo método de deteção e identificação de patologias da
voz através da exploração de parâmetros acústicos de sinais de fala contínua na população
idosa. Adicionalmente, é realizado um estudo da influência do género e da idade no
desempenho dos sistemas de detecção de patologias da voz.
A amostra deste estudo é composta por 44 indivíduos com idades superiores a 60
anos referentes às patologias Disfonia, Disfonia Funcional e Disfonia Espasmódica. No
conjunto de dados originados com esta configuração, foram criados dois subconjuntos de-
pendentes do género: um com apenas amostras femininas e o outro com apenas amostras
masculinas. O sistema desenvolvido utilizou três métodos de seleção de atributos e cinco
algoritmos de Aprendizagem Automática de modo a classificar o sinal de voz de acordo
com a presença de patologias da voz.
A deteção de patologia de voz alcançou uma exatidão de 85,1%±5,1% para os da-
dos sem divisão de género, 83,7%±7,0% para os dados masculinos, e 87,4%±4,2% para
os dados femininos. A classificação de diferentes patologias alcançou uma exatidão de
69,0%±5,1% para os dados sem divisão de género, 63,7%±5,4% para os dados masculinos,
e 80,6%±8,1% para os dados femininos.
Os resultados obtidos revelaram que os atributos que caracterizam a fluência são
importantes e discriminatórios nestes tipos de sistemas. Ademais, o classificador Random
Forest demonstrou ser o algoritmo mais eficaz na deteção e identificação de patologias da
voz.
O modelo proposto revelou-se promissor na deteção de vozes patológicas e identifi-
cação da patologia subjacente numa população idosa, aumentando o seu desempenho
quando ocorre uma divisão de género
Optimizing laryngeal pathology detection by using combined cepstral features
ABSTRACT There are several diseases that affect the human voice quality which can be organic or neurological. Acoustic analysis of voice features can be used as a complementary and noninvasive tool for the diagnosis of laryngeal pathologies. The degree of reliability and effectiveness of the discriminating process depends on the appropriate acoustic feature extraction. This work presents a parametric method based on cepstral features to discriminate pathological voices of speakers affected by vocal fold edema and paralysis from healthy voices. Cepstral, weighted cepstral, delta cepstral, and weighted delta cepstral coefficients are obtained from speech signals. A Vector Quantization is carried out individually for each feature in the classification process, associated with a distortion measurement. The goal is to evaluate a performance of a classifier based on the individual and combined cepstral features. The average, the product and the weighted average are the different combination strategies applied yielding a multiple classifier that is more efficient than each individual technique. To assess the accuracy of the system, 153 speech files of sustained vowel /ah/ (53 healthy, 44 vocal fold edema and 56 paralysis) of the Disordered Voice Database from Massachusetts Eye and Ear Infirmary (MEEI) are used. Results show that the employed parameters are complementary and they can be used to detect vocal disorders caused by the presence of vocal fold pathologies
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy
Semi-supervised learning with generative models for pathological speech classification
Recent work in pathological speech classification has employed supervised learning algorithms such as neural networks and support vector machines to classify speech as healthy or pathological. A challenge in applying such machine learning techniques to pathological speech classification is the labelled data shortage problem. While labelled data are expensive and scarce, unlabelled data are inexpensive and plentiful. Labelled data acquisition often entails significant human effort and time-consuming experimental design. Further, for medical applications, privacy and ethical issues must be addressed where patient data is collected.
In this thesis, we investigate a semi-supervised learning (SSL) approach that employs a generative model to incorporate both labelled and unlabelled data into the training process. Generative models explored include both a generative adversarial network (GAN) and a variational autoencoder (VAE). To employ a GAN, we modify its traditional discriminator to not only differentiate between real and fake speech samples but to also classify the given sample as healthy or pathological. To employ a VAE, we first pre-train the VAE with unlabelled data and subsequently, incorporate the pre-trained encoder into a classifier to be trained on labelled data.
We test our approach using three commonly used pathological speech datasets: the Spanish Parkinson’s Diseases Dataset (SPDD), the Saarbrucken Voice Database (SVD) and the Arabic Voice Pathology Database (AVPD). We compare the performance of the GAN and VAE-based approaches trained on both labelled and unlabelled data with a traditional supervised approach based on a convolutional neural network (CNN) trained only on labelled data.
We observe that our SSL-based approach leads to an accuracy gain compared to a baseline CNN trained only on labelled pathological speech data. This promising result shows that our approach has the potential to alleviate the labelled data shortage problem in pathological speech classification and other medical applications where labelled data acquisition is challenging
Models and Analysis of Vocal Emissions for Biomedical Applications
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 4th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2005, held 29-31 October 2005, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
- …