7 research outputs found

    Application of Automatic Speaker Recognition techniques to pathological voice assessment (dysphonia)

    No full text
    International audienceThis paper investigates the adaptation of Automatic Speaker Recognition (ASR) techniques to the pathological voice assessment (dysphonic voices). The aim of this study is to provide a novel method, suitable for keeping track of the evolution of the patient's pathology: easy-to-use, fast, non-invasive for the patient, and affordable for the clinicians. This method will be complementary to the existing ones - the perceptual judgment and the usual objective measurement (jitter, airflows...) which remain time and human resource consuming. The system designed for this particular task relies on the GMMbased approach, which is the state-of-the-art for speaker recognition. It is derived from the open source ASR tools (LIA_Spk- Det and ALIZE) of the LIA lab.Experiments conducted on a dysphonic corpus provide promising results, underlining the interest of such an approach and opening further research investigation

    Severe apnoea detection using speaker recognition techniques

    Full text link
    Proceedings of the International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS 2009)The aim of this paper is to study new possibilities of using Automatic Speaker Recognition techniques (ASR) for detection of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases can be very useful to give priority to their early treatment optimizing the expensive and timeconsuming tests of current diagnosis methods based on full overnight sleep in a hospital. This work is part of an on-going collaborative project between medical and signal processing communities to promote new research efforts on automatic OSA diagnosis through speech processing technologies applied on a carefully designed speech database of healthy subjects and apnoea patients. So far, in this contribution we present and discuss several approaches of applying generative Gaussian Mixture Models (GMMs), generally used in ASR systems, to model specific acoustic properties of continuous speech signals in different linguistic contexts reflecting discriminative physiological characteristics found in OSA patients. Finally, experimental results on the discriminative power of speaker recognition techniques adapted to severe apnoea detection are presented. These results obtain a correct classification rate of 81.25%, representing a promising result underlining the interest of this research framework and opening further perspectives for improvement using more specific speech recognition technologiesThe activities described in this paper were funded by the Spanish Ministry of Science and Technology as part of the TEC2006-13170-C02-01 project

    Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

    Full text link
    Usually, people talk neutrally in environments where there are no abnormal talking conditions such as stress and emotion. Other emotional conditions that might affect people talking tone like happiness, anger, and sadness. Such emotions are directly affected by the patient health status. In neutral talking environments, speakers can be easily verified, however, in emotional talking environments, speakers cannot be easily verified as in neutral talking ones. Consequently, speaker verification systems do not perform well in emotional talking environments as they do in neutral talking environments. In this work, a two-stage approach has been employed and evaluated to improve speaker verification performance in emotional talking environments. This approach employs speaker emotion cues (text-independent and emotion-dependent speaker verification problem) based on both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. The approach is comprised of two cascaded stages that combines and integrates emotion recognizer and speaker recognizer into one recognizer. The architecture has been tested on two different and separate emotional speech databases: our collected database and Emotional Prosody Speech and Transcripts database. The results of this work show that the proposed approach gives promising results with a significant improvement over previous studies and other approaches such as emotion-independent speaker verification approach and emotion-dependent speaker verification approach based completely on HMMs.Comment: Journal of Intelligent Systems, Special Issue on Intelligent Healthcare Systems, De Gruyter, 201

    Design of a multimodal database for research on automatic detection of severe apnoea cases

    Get PDF
    The aim of this paper is to present the design of a multimodal database suitable for research on new possibilities for automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases can be very useful to give priority to their early treatment optimizing the expensive and time-consuming tests of current diagnosis methods based on full overnight sleep in a hospital. This work is part of an on-going collaborative project between medical and signal processing groups towards the design of a multimodal database as an innovative resource to promote new research efforts on automatic OSA diagnosis through speech and image processing technologies. In this contribution we present the multimodal design criteria derived from the analysis of specific voice properties related to OSA physiological effects as well as from the morphological facial characteristics in apnoea patients. Details on the database structure and data collection methodology are also given as it is intended to be an open resource to promote further research in this field. Finally, preliminary experimental results on automatic OSA voice assessment are presented for the collected speech data in our OSA multimodal database. Standard GMM speaker recognition techniques obtain an overall correct classification rate of 82%. This represents an initial promising result underlining the interest of this research framework and opening further perspectives for improvement using more specific speech and image recognition technologies

    Corpus de parole pathologique, état d'avancement et enjeux méthodologiques

    Get PDF
    Autorisation No.3015 : TIPA est la revue du Laboratoire Parole et LangageDepuis une quinzaine d'années, l'étude des dysfonctionnements de la voix et de la parole est sortie du simple cadre de la recherche clinique et intéresse les laboratoires de recherche issus des sciences du langage. Par l'observation des dysfonctionnements, les chercheurs SHS confrontent les résultats de leurs recherches établies sur des corpus de parole "normale" à des situations d'élocution pathologique. En effet, le dysfonctionnement aide à comprendre le fonctionnement. Ces situations permettent un enrichissement des connaissances entre les communautés de scientifiques du langage, des cliniciens mais aussi de chercheurs issus des STIC. Actuellement, les études sur le dysfonctionnement de la voix et de la parole souffrent cruellement d'une dispersion et hétérogénéité des données. Souvent, les analyses portent sur quelques locuteurs enregistrés pour les besoins ponctuels d'une étude, ce qui affaiblit considérablement la portée des résultats et permet mal de généraliser les conclusions. L'enregistrement des données et le stockage sont souvent effectués par du personnel non formé à certains aspects techniques de la prise et du formatage de données, ce qui peut entraîner une impossibilité de diffusion. A cela s'ajoute la perte quasi systématique des méta-données, ce qui explique souvent la difficulté de faire émerger des résultats clairs car l'homogénéité des populations testées devient totalement opaque.Notre projet s'inscrit dans un dessein plus vaste qui vise à décrire et évaluer les dysfonctionnements de la voix et de la parole, ceci dans une optique fédérative et multidisciplinaire, en axant notre effort, dans un premier temps, sur la mise à disposition de masses de données organisées, de méthodes d'analyse et d'outils mutualisés. Pour cela, il est nécessaire d'obtenir un consensus permettant de proposer des recommandations et un mode de fonctionnement assurant un partage effectif des données. Cela implique la rédaction d'un protocole général, de conventions et de directions d'analyse. Le principe n'est pas d'imposer une façon de faire unique mais d'offrir un cadre de travail permettant d'assurer une compatibilité des données recueillies de parole pathologique, des méta-données et enrichissements associés.La deuxième étape consiste à mettre en place et développer un système d'interrogation, d'extraction, de classification des données de parole pathologique. Cela implique l'élaboration d'une organisation en base de données associant informations cliniques et enregistrements sonores et physiologiques, ceci dans une optique multicentrique capable d'intégrer différentes informations variant selon les équipes de recherche. Cette base devra permettre de centraliser et redéployer les informations provenant de divers laboratoires de recherche et centres cliniques impliqués dans l'étude des dysfonctionnements de la voix et de la parole. La consultation de la base doit être rendue publique par Internet avec divers niveaux d'accessibilité en restreignant l'accès selon des modalités à définir entre les partenaires du projet et selon l'ouverture prévue par la suite. De plus, il nous paraît important de fournir un ensemble d'outils d'analyse de ce type de corpus. Si certains outils sont disponibles sous la forme de dispositifs informatisés de tests de perception ou de logiciels d'analyse « classique » du signal, il nous parait intéressant d'introduire des systèmes de traitement issus de la reconnaissance automatique de la parole et du locuteur afin d'être en mesure d'évaluer une grande masse de données et d'obtenir des modèles et résultats statistiquement conséquents. Enfin, il nous paraît intéressant, dans le cadre de ce projet, de proposer un sous-ensemble de données représentatives des troubles de la communication parlée, ceci dans une optique pédagogique qui permettrait de fournir un support aux enseignements à la fois dans les cursus cliniques comme les écoles d'orthophonie mais aussi en sciences du langage ayant des filières « dysfonctionnements ».Notre objectif est d'aboutir dans ce sens au premier corpus conséquent de parole pathologique (dysphonies et dysarthries) de langue française, ainsi qu'à une mise à disposition d'outils communs adaptés à ce type de données. Cela ouvrirait la portée scientifique des études portant sur les dysfonctionnements de la voix et de la parole.Il faut aussi souligner la visée sociale de ce type de projet. Nous touchons au domaine de la santé. Notre projet contribuerait fortement à améliorer ce problème de communication du malade, relevant de la santé publique. Il est clair qu'une meilleure connaissance et évaluation des troubles de la voix et de la parole aurait un impact direct sur la prise en charge des personnes atteintes de ces troubles, celles-ci souffrant bien souvent d'une rupture sociale liée à la dégradation de leur capacité de communication avec leur entourage.Enfin, il faut signaler que la mise à disposition d'un tel type de corpus est d'un grand intérêt pour les laboratoires issus des technologies de l'information et de la communication. En effet, certaines équipes, qui travaillent par exemple sur la reconnaissance automatique de la parole ou du locuteur, manquent totalement de données organisées pour tester leurs systèmes dans des situations atypiques ou encore, pour adapter leurs méthodes aux situations de dysfonctionnement dans le but de fournir des systèmes de classification automatiques dédiés à l'évaluation de la qualité vocale ; le but final de ces travaux étant de fournir une aide au diagnostic et au suivi des dysfonctionnements

    Voz e emoção em português europeu

    Get PDF
    Doutoramento em Ciências e Tecnologias da SaúdeNo trabalho apresentado realiza-se uma primeira descrição de voz e emoção para o Português Europeu. Estudamos, utilizando como base estudos realizados em diversas línguas (finlandês; inglês; alemão), os parâmetros relacionados com voz e que variam consoante a emoção que expressamos. Analisamos assim os parâmetros relacionados com a frequência Fundamental (F0) com a perturbação (jitter) com a amplitude (shimmer) e com aspectos relacionados com o ruído (HNR). Trata-se de um estudo abrangente que estudando voz e a sua relação/variação de acordo com a emoção o faz em três vertentes: patologia de voz de origem psicogénica (carácter emocional); emoção produzida por actores e a análise de emoção espontânea. Conseguindo, como trabalho pioneiro nesta área, valores para todos estes tipos de produção. Salientamos o facto de no nosso trabalho apenas existir a análise de voz sem recurso a expressão facial ou à postura dos indivíduos. Para que pudéssemos realizar estudos comparativos com os dados que íamos recolhendo em cada corpus (patologia; emoção por actor e emoção espontânea), procurámos utilizar sempre os mesmos métodos de análise (Praat; SFS; SPSS, Hoarseness Diagram – para a análise de voz com patologia - e o sistema Feeltrace - para as emoções espontâneas). Os estudos e análises relativos à emoção produzida por actores são complementados por testes de percepção aplicados a falantes nativos de Inglês Americano e a falantes de Português Europeu. Este teste, juntamente com a análise da emoção espontânea, permitiu-nos retirar dados particulares relativos à língua portuguesa. Apesar de haver tanto na expressão como na percepção de emoções muitas características consideradas universais, em Português percebe-se algo de peculiar. Os valores para a expressão neutra; tristeza e alegria são todos muito próximos, ao contrário do que acontece noutras línguas. Além disso estas três emoções (de famílias distintas) são as que mais dificuldades causam (aos dois grupos de informantes) em termos de distinção no teste de percepção. Poderá ser esta a particularidade da expressão da emoção no Português Europeu, podendo estar ligada a factores culturais. Percebe-se ainda, com este trabalho, que a emoção expressa pelo actor se aproxima da emoção espontânea. No entanto, alguns parâmetros apresentam valores diferentes, isto porque o actor tem a tendência de exagerar a emoção. Com este trabalho foram criados corpora originais que serão um recurso importante a disponibilizar para futuras análises numa área que é ainda deficitária, em termos de investigação científica, em Portugal. Tanto os corpora, como respectivos resultados obtidos poderão vir a ser úteis em áreas como as Ciências da Fala; Robótica e Docência.This research is a first description of voice and emotion for European Portuguese. Based on studies for several languages (Finish, English, German), we studied voice related parameters varying according to the emotion expressed. The analysed parameters are related to F0, jitter, shimmer and Harmonic Noise Ratio (HNR). This is an all-embracing study that approaches voice and its relation/variation according to emotion. Three sources are considered: psychogenic voice pathology; emotion produced by actors; and spontaneous emotion analyses. Being a trailblazing research in this area, values were measured for all three types of production. We highlight the fact that our work only considers voice analysis without considering facial expression and body posture. In order to make comparative studies with the data collected for each corpus (pathology, acted emotion and spontaneous emotion), we used the same analysis methods (Praat, SFS, SPSS, and Hoarseness Diagram - for pathological voice analysis; and Feeltrace system - for the spontaneous emotions). Studies and analyses related to acted emotion are complemented by perceptual tests with American English and European Portuguese speakers. Theses tests, as well as spontaneous emotion analysis, allowed the extraction of data for Portuguese. Both emotion expression and perception have many universal characteristics. However, Portuguese language proved to have some particularities. Values obtained for neutral expression, sadness, and joy are very close, contrary to what happens in other languages. Moreover, these three emotions (from distinct families) are those that present the most difficulty (to both informant groups) in the distinction perceptual tests. This is probably the main particularity in emotion expression as far as Portuguese is concerned, maybe due to cultural factors. This research also shows that acted emotion is close to spontaneous emotion expression. However, some parameters present different values, because the actor tends to somewhat exaggerate the emotion. This work led to the creation of original corpora that can be an important resource for future analyses in an area still in deficit in terms of scientific research in Portugal. Both corpora and the obtained results may be useful in areas such as Speech Science, Robotics, and Education
    corecore