592 research outputs found

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

    Analysis and Detection of Pathological Voice using Glottal Source Features

    Full text link
    Automatic detection of voice pathology enables objective assessment and earlier intervention for the diagnosis. This study provides a systematic analysis of glottal source features and investigates their effectiveness in voice pathology detection. Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method, using approximate glottal source signals computed with the zero frequency filtering (ZFF) method, and using acoustic voice signals directly. In addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF to effectively capture the variations in glottal source spectra of pathological voice. Experiments were carried out using two databases, the Hospital Universitario Principe de Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database. Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice. Pathology detection experiments were carried out using support vector machine (SVM). From the detection experiments it was observed that the performance achieved with the studied glottal source features is comparable or better than that of conventional MFCCs and perceptual linear prediction (PLP) features. The best detection performance was achieved when the glottal source features were combined with the conventional MFCCs and PLP features, which indicates the complementary nature of the features

    Identification of voice pathologies in an elderly population

    Get PDF
    Ageing is associated with an increased risk of developing diseases, including a greater pre- disposition to develop diseases such as Sepsis. Also, with ageing, human voices undergo a natural degradation gauged by alterations in hoarseness, breathiness, articulatory ability, and speaking rate. Nowadays, perceptual evaluation is widely used to assess speech and voice impairments despite its high subjectivity. This dissertation proposes a new method for detecting and identifying voice patholo- gies by exploring acoustic parameters of continuous speech signals in the elderly popula- tion. Additionally, a study of the influence of gender and age on voice pathology detection systems’ performance is conducted. The study included 44 subjects older than 60 years old, with the pathologies Dyspho- nia, Functional Dysphonia, and Spasmodic Dysphonia. In the dataset originated with these settings, two gender-dependent subsets were created, one with only female samples and the other with only male samples. The system developed used three feature selection methods and five Machine Learning algorithms to classify the voice signal according to the presence of pathology. The binary classification, which consisted of voice pathology detection, reached an accuracy of 85,1%±5,1% for the dataset without gender division, 83,7%±7,0% for the male dataset, and 87,4%±4,2% for the female dataset. As for the multiclass classifica- tion, which consisted of the classification of different pathologies, reached an accuracy of 69,0%±5,1% for the dataset without gender division, 63,7%± 5,4% for the male dataset, and 80,6%±8,1% for the female dataset. The obtained results revealed that features that describe fluency are important and discriminating in these types of systems. Also, Random Forest has shown to be the most effective Machine Learning algorithm for both binary and multiclass classification. The proposed model proves to be promising in detecting pathological voices and identifying the underlying pathology in an elderly population, with an increase in its performance when a gender division is performed.O envelhecimento está associado a um maior risco de desenvolvimento de doenças, nome- adamente a uma maior predisposição para a evolução de doenças como a Sepsis. Inclusiva- mente, com o envelhecimento, a voz sofre uma degradação natural aferindo-se alterações na rouquidão, respiração, capacidade articulatória e no ritmo do discurso. Atualmente, a avaliação percetual é amplamente utilizada para avaliar as perturbações da fala e da voz, possuindo elevada subjetividade. Esta dissertação propõe um novo método de deteção e identificação de patologias da voz através da exploração de parâmetros acústicos de sinais de fala contínua na população idosa. Adicionalmente, é realizado um estudo da influência do género e da idade no desempenho dos sistemas de detecção de patologias da voz. A amostra deste estudo é composta por 44 indivíduos com idades superiores a 60 anos referentes às patologias Disfonia, Disfonia Funcional e Disfonia Espasmódica. No conjunto de dados originados com esta configuração, foram criados dois subconjuntos de- pendentes do género: um com apenas amostras femininas e o outro com apenas amostras masculinas. O sistema desenvolvido utilizou três métodos de seleção de atributos e cinco algoritmos de Aprendizagem Automática de modo a classificar o sinal de voz de acordo com a presença de patologias da voz. A deteção de patologia de voz alcançou uma exatidão de 85,1%±5,1% para os da- dos sem divisão de género, 83,7%±7,0% para os dados masculinos, e 87,4%±4,2% para os dados femininos. A classificação de diferentes patologias alcançou uma exatidão de 69,0%±5,1% para os dados sem divisão de género, 63,7%±5,4% para os dados masculinos, e 80,6%±8,1% para os dados femininos. Os resultados obtidos revelaram que os atributos que caracterizam a fluência são importantes e discriminatórios nestes tipos de sistemas. Ademais, o classificador Random Forest demonstrou ser o algoritmo mais eficaz na deteção e identificação de patologias da voz. O modelo proposto revelou-se promissor na deteção de vozes patológicas e identifi- cação da patologia subjacente numa população idosa, aumentando o seu desempenho quando ocorre uma divisão de género

    Optimizing laryngeal pathology detection by using combined cepstral features

    Get PDF
    ABSTRACT There are several diseases that affect the human voice quality which can be organic or neurological. Acoustic analysis of voice features can be used as a complementary and noninvasive tool for the diagnosis of laryngeal pathologies. The degree of reliability and effectiveness of the discriminating process depends on the appropriate acoustic feature extraction. This work presents a parametric method based on cepstral features to discriminate pathological voices of speakers affected by vocal fold edema and paralysis from healthy voices. Cepstral, weighted cepstral, delta cepstral, and weighted delta cepstral coefficients are obtained from speech signals. A Vector Quantization is carried out individually for each feature in the classification process, associated with a distortion measurement. The goal is to evaluate a performance of a classifier based on the individual and combined cepstral features. The average, the product and the weighted average are the different combination strategies applied yielding a multiple classifier that is more efficient than each individual technique. To assess the accuracy of the system, 153 speech files of sustained vowel /ah/ (53 healthy, 44 vocal fold edema and 56 paralysis) of the Disordered Voice Database from Massachusetts Eye and Ear Infirmary (MEEI) are used. Results show that the employed parameters are complementary and they can be used to detect vocal disorders caused by the presence of vocal fold pathologies

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Semi-supervised learning with generative models for pathological speech classification

    Get PDF
    Recent work in pathological speech classification has employed supervised learning algorithms such as neural networks and support vector machines to classify speech as healthy or pathological. A challenge in applying such machine learning techniques to pathological speech classification is the labelled data shortage problem. While labelled data are expensive and scarce, unlabelled data are inexpensive and plentiful. Labelled data acquisition often entails significant human effort and time-consuming experimental design. Further, for medical applications, privacy and ethical issues must be addressed where patient data is collected. In this thesis, we investigate a semi-supervised learning (SSL) approach that employs a generative model to incorporate both labelled and unlabelled data into the training process. Generative models explored include both a generative adversarial network (GAN) and a variational autoencoder (VAE). To employ a GAN, we modify its traditional discriminator to not only differentiate between real and fake speech samples but to also classify the given sample as healthy or pathological. To employ a VAE, we first pre-train the VAE with unlabelled data and subsequently, incorporate the pre-trained encoder into a classifier to be trained on labelled data. We test our approach using three commonly used pathological speech datasets: the Spanish Parkinson’s Diseases Dataset (SPDD), the Saarbrucken Voice Database (SVD) and the Arabic Voice Pathology Database (AVPD). We compare the performance of the GAN and VAE-based approaches trained on both labelled and unlabelled data with a traditional supervised approach based on a convolutional neural network (CNN) trained only on labelled data. We observe that our SSL-based approach leads to an accuracy gain compared to a baseline CNN trained only on labelled pathological speech data. This promising result shows that our approach has the potential to alleviate the labelled data shortage problem in pathological speech classification and other medical applications where labelled data acquisition is challenging

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 4th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2005, held 29-31 October 2005, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
    corecore