60 research outputs found

    Application of Automatic Speaker Recognition techniques to pathological voice assessment (dysphonia)

    No full text
    International audienceThis paper investigates the adaptation of Automatic Speaker Recognition (ASR) techniques to the pathological voice assessment (dysphonic voices). The aim of this study is to provide a novel method, suitable for keeping track of the evolution of the patient's pathology: easy-to-use, fast, non-invasive for the patient, and affordable for the clinicians. This method will be complementary to the existing ones - the perceptual judgment and the usual objective measurement (jitter, airflows...) which remain time and human resource consuming. The system designed for this particular task relies on the GMMbased approach, which is the state-of-the-art for speaker recognition. It is derived from the open source ASR tools (LIA_Spk- Det and ALIZE) of the LIA lab.Experiments conducted on a dysphonic corpus provide promising results, underlining the interest of such an approach and opening further research investigation

    Validity and reliability of the 2nd European Portuguese version of the “Consensus Auditory-Perceptual Evaluation of Voice” (II EP CAPE-V)

    Get PDF
    A thesis submitted in partial fulfillment of the requirement for the degree of Master in Science at the Health Science School of Polytechnic Institute of SetúbalIntroduction: Auditory-perceptual evaluation of voice is a part of a multidimensional voice evaluation, and is claimed to be “golden standard”. The “Consensus Auditory-Perceptual Evaluation of Voice” (CAPE-V) has been demonstrated to be a valid and reliable instrument for voice evaluation, when applied in both clinical and scientific research fields. The CAPE-V was first translated into European Portuguese (EP) (Jesus et al., 2009) however it revealed some validity and reliability problems. The purpose of this study was to assure a valid and reliable EP version of CAPE-V. This resulted in the 2nd EP version of CAPE-V (II EP CAPE-V), with permission granted by ASHA. Method: This was a transversal, observational, descriptive, and comparative study. 14 Speech-language pathologists (SLPs) voice experts (>5 years of clinical practice), rated a total of 26 voice samples produced by 10 males (mean age=45) and 10 females (mean age=43) classified into two groups: a control group (n=10) and a dysphonic group (n=10), with subjects matched for age and gender. All voice samples were rated in one session with the II EP CAPE-V, and in a second session one week later with GRBAS. Content validity was supported by 6 new sentences conceptualized and adapted to EP linguistic and cultural context according to the rationale outlined in the original CAPE-V protocol. For construct validity analysis, an independent samples t-test (α=.05) was performed for all vocal parameter. Concurrent validity was estimated with the multi-serial correlation coefficient between II EP CAPE-V and GRBAS parameters (r>.70). Reliability was performed for all vocal parameters. Inter-rater reliability was determined by ICC, and intra-rater reliability by Pearson’s correlation coefficient (r>.70). Results/conclusion: Content validity was assured by an EP linguistic expert, who reviewed the six new sentences. Construct validity was obtained for all voical parameters (p.89) for overall severity/grade, roughness, and breathiness parameters. High inter-rater reliability (ICC>.84) was obtained for all parameters. Intra-rater reliability was high (r>.87) for overall severity, breathiness, and pitch; good (r=.73) for strain; and moderate (r>.69) for roughness and loudness parameters. The II EP CAPE-V is a valid and reliable instrument for auditory-perceptual evaluation, with all psychometric characteristics established

    Análisis de métodos de parametrización y clasificación para la simulación de un sistema de evaluación perceptual del grado de afección en voces patológicas

    Get PDF
    Los procedimientos de evaluación de la calidad de la voz basados en la valoración subjetiva a través de la percepción acústica por parte de un experto están bastante extendidos. Entre ellos,el protocolo GRBAS es el más comúnmente utilizado en la rutina clínica. Sin embargo existen varios problemas derivados de este tipo de estimaciones, el primero de los cuales es que se precisa de profesionales debidamente entrenados para su realización. Otro inconveniente reside en el hecho de que,al tratarse de una valoración subjetiva, múltiples circunstancias significativas influyen en la decisión final del evaluador, existiendo en muchos casos una variabilidad inter-evaluador e intra-evaluador en los juicios. Por estas razones se hace necesario el uso de parámetros objetivos que permitan realizar una valoración de la calidad de la voz y la detección de diversas patologías. Este trabajo tiene como objetivo comparar la efectividad de diversas técnicas de cálculo de parámetros representativos de la voz para su uso en la clasificación automática de escalas perceptuales. Algunos parámetros analizados serán los coeficientes Mel-Frequency Cepstral Coefficients(MFCC),las medidas de complejidad y las de ruido.Así mismo se introducirá un nuevo conjunto de características extraídas del Espectro de Modulación (EM) denominadas Centroides del Espectro de Modulación (CEM).En concreto se analizará el proceso de detección automática de dos de los cinco rasgos que componen la escala GRBAS: G y R. A lo largo de este documento se muestra cómo las características CEM proporcionan resultados similares a los de otras técnicas anteriormente utilizadas y propician en algún caso un incremento en la efectividad de la clasificación cuando son combinados con otros parámetros

    Acoustic measurement of overall voice quality in sustained vowels and continuous speech

    Get PDF
    Measurement of dysphonia severity involves auditory-perceptual evaluations and acoustic analyses of sound waves. Meta-analysis of proportional associations between these two methods showed that many popular perturbation metrics and noise-to-harmonics and others ratios do not yield reasonable results. However, this meta-analysis demonstrated that the validity of specific autocorrelation- and cepstrum-based measures was much more convincing, and appointed ‘smoothed cepstral peak prominence’ as the most promising metric of dysphonia severity. Original research confirmed this inferiority of perturbation measures and superiority of cepstral indices in dysphonia measurement of laryngeal-vocal and tracheoesophageal voice samples. However, to be truly representative for daily voice use patterns, measurement of overall voice quality is ideally founded on the analysis of sustained vowels ánd continuous speech. A customized method for including both sample types and calculating the multivariate Acoustic Voice Quality Index (i.e., AVQI), was constructed for this purpose. Original study of the AVQI revealed acceptable results in terms of initial concurrent validity, diagnostic precision, internal and external cross-validity and responsiveness to change. It thus was concluded that the AVQI can track changes in dysphonia severity across the voice therapy process. There are many freely and commercially available computer programs and systems for acoustic metrics of dysphonia severity. We investigated agreements and differences between two commonly available programs (i.e., Praat and Multi-Dimensional Voice Program) and systems. The results indicated that clinicians better not compare frequency perturbation data across systems and programs and amplitude perturbation data across systems. Finally, acoustic information can also be utilized as a biofeedback modality during voice exercises. Based on a systematic literature review, it was cautiously concluded that acoustic biofeedback can be a valuable tool in the treatment of phonatory disorders. When applied with caution, acoustic algorithms (particularly cepstrum-based measures and AVQI) have merited a special role in assessment and/or treatment of dysphonia severity

    Breathing, swallowing and voice in laryngeal disorsers

    Get PDF
    The aim of this work was to examine how breathing, swallowing and voicing are affected in different laryngeal disorders. For this purpose, we examined four different patient groups: patients who had undergone total laryngectomy, anterior cervical decompression (ACD), or injection laryngoplasty with autologous fascia (ILAF), and patients with dyspnea during exercise. We studied the problems and benefits related to the automatic speech valve used for the rehabilitation of speech in laryngectomized patients. The device was given to 14 total laryngectomized patients who used the traditional valve especially well. The usefulness of voice and intelligibility of speech were assessed by speech pathologists. The results demonstrated better performance with the traditional valve in both dimensions. Most of the patients considered the automatic valve a helpful additional device but because of heavier breathing and the greater work needed for speech production, it was not suitable as a sole device in speech rehabilitation. Dysphonia and dysphagia are known complications of ACD. These symptoms are caused due to the stretching of tissue needed during the surgery, but the extent and the recovery from them was not well known before our study. We studied two patient groups, an early group with 50 patients who were examined immediately before and after the surgery and a late group with 64 patients who were examined 3 9 months postoperatively. Altogether, 60% reported dysphonia and 69% dysphagia immediately after the operation. Even though dysphagia and dysphonia often appeared after surgery, permanent problems seldom occurred. Six (12 %) cases of transient and two (3 %) permanent vocal cord paresis were detected. In our third study, the long-term results of ILAF in 43 patients with unilateral vocal cord paralysis were examined. The mean follow-up was 5.8 years (range 3 10). Perceptual evaluation demonstrated improved results for voice quality, and videostroboscopy revealed complete or partial glottal closure in 83% of the patients. Fascia showed to be a stable injection material with good vocal results. In our final study we developed a new diagnostic method for exertional laryngeal dyspnea by combining a cardiovascular exercise test with simultaneous fiberoptic observation of the larynx. With this method, it is possible to visualize paradoxal closure of the vocal cords during inspiration, which is a diagnostic criterion for vocal cord dysfunction (VCD). We examined 30 patients referred to our hospital because of suspicion of exercise-induced vocal cord dysfunction (EIVCD). Twenty seven out of thirty patients were able to perform the test. Dyspnea was induced in 15 patients, and of them five had EIVCD and four high suspicion of EIVCD. With our test it is possible to set an accurate diagnosis for exertional laryngeal dyspnea. Moreover, the often seen unnecessary use of asthma drugs among these patients can be avoided.Hengityksen, äänentuoton ja nielemisen häiriintymättömän toiminnan kannalta kurkunpään merkitys on keskeinen. Tässä kurkunpään sairauksia käsittelevässä väitöskirjatyössä tutkittiin näitä toimintoja neljässä eri potilasryhmässä. Kurkunpään poistoleikkauksessa (laryngektomia) ääntä tuottavat rakenteet poistetaan, ja potilaalle tehdään hengitysavanne. Ääni tuotetaan leikkauksen jälkeen henkitorven ja ruokatorven väliin asetetun ääniproteesin avulla. Perinteinen ääniproteesi vaatii toimiakseen hengitysavanteen sulun sormin. Äänen tuoton apuvälineeksi on kehitetty automaattinen puheläppä, joka mahdollistaa puheen ilman sormisulkua. Laitteen käyttökelpoisuudesta puheen kuntoutuksessa ei ole aiemmin ollut riittävästi tietoa. Tutkimuksessamme automaattinen puheläppä annettiin 14 potilaalle, jotka käyttivät perinteistä puheläppää ongelmitta. Automaattinen puheläppä osoittautui hyödylliseksi puheentuoton apuvälineeksi, mutta jopa näiden valikoitujen potilaiden mukaan hengitys ja puhuminen oli raskaampaa automaattisella puheläpällä kuin perinteisellä puheläpällä. Kaularankaleikkaus on yleinen toimenpide välilevypullistuman ja nikamasiirtymän hoidossa. Recurrens- hermon vaurio on tunnettu kaularankakirurgian komplikaatio, joka aiheuttaa käheyttä ja nielemisvaikeutta. Tarkkoja tietoja sen yleisyydestä, vaurion pysyvyydestä tai potilaalle aiheutuvasta haitasta ei ole aiemmin tunnettu. Lähes 70 %:lla potilaista oli ääni- ja nielemisvaikeuksia heti leikkauksen jälkeen, mutta melkein kaikki potilaat toipuivat 3kk seurannan aikana. Ohimeneviä äänihuulihalvauksia todettiin 12 %:lla ja pysyviä halvauksia 3 %:lla potilaista. Tutkimustulosten perusteella potilaille voidaan kertoa odotettavissa olevista ääni- ja nielemisvaikeuksista sekä niistä toipumisesta aiempaa tarkemmin. Äänihuulihalvaus syntyy recurrens- hermon vaurioituessa esimerkiksi kirurgisen komplikaation seurauksena, ja osa potilaista tarvitsee äänikirurgiaa vaikean käheyden korjaamiseksi. Leikkauksessa halvaantunut äänihuuli yleensä medialisoidaan eli tuodaan keskiviivaan, jolloin liikkuva äänihuuli saa siihen taas kontaktin ja soinnikas ääni palautuu. Faskiainjektiossa reiden fascia latasta irrotettava lihaskalvo (faskia) pilkotaan massaksi, joka ruiskutetaan mikroskooppitarkkailussa halvaantuneeseen äänihuulilihakseen sen medialisoimiseksi. Tutkimuksessamme äänihuulten täydellinen tai osittainen sulku voitiin todeta 83 %:lla potilaista, ja 56 % heistä ilmoitti äänen olevan normaali tai lähes normaali kun faskiainjektiosta oli kulunut 3-10 vuotta. Tulokset osoittivat faskiainjektion turvalliseksi, käyttökelpoiseksi ja pysyväksi lievien ja keskivaikeiden äänihuulihalvausten kirurgiseksi hoitomenetelmäksi. Toiminnallisella äänihuulisalpauksella tarkoitetaan kurkunpään toimintahäiriötä, jossa äänihuulet paradoksaalisesti lähentyvät toisiaan sisäänhengityksen aikana. Tämä aiheuttaa sisäänhengitysvaikeutta ja hengityksen vinkunaa. Rasituksessa oireilevat potilaat voidaan tutkia kehittämämme menetelmän, rasituslaryngoskopian, avulla. Siinä ergometrirasitus yhdistetään taipuisan tähystimen avulla nenän kautta tehtävään kurkunpään tarkkailuun. Hengenahdistus ilmaantui testin aikana viidelletoista (56%) potilaalle ja heistä viidellä (33 %) todettiin toiminnallinen äänihuulihalvaus ja neljällä (27 %) vahva epäilys siitä. Testin avulla rasitushengenahdistusoireita voidaan diagnosoida entistä tarkemmin. Toiminnallinen äänihuulisalpaus sekoitetaan helposti rasitusastmaan, joten parantunut diagnostiikka auttaa välttämään turhaa astmalääkitystä

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies
    corecore