487 research outputs found

    Formant analysis in dysphonic patients and automatic Arabic digit speech recognition

    Get PDF
    <p>Abstract</p> <p>Background and objective</p> <p>There has been a growing interest in objective assessment of speech in dysphonic patients for the classification of the type and severity of voice pathologies using automatic speech recognition (ASR). The aim of this work was to study the accuracy of the conventional ASR system (with Mel frequency cepstral coefficients (MFCCs) based front end and hidden Markov model (HMM) based back end) in recognizing the speech characteristics of people with pathological voice.</p> <p>Materials and methods</p> <p>The speech samples of 62 dysphonic patients with six different types of voice disorders and 50 normal subjects were analyzed. The Arabic spoken digits were taken as an input. The distribution of the first four formants of the vowel /a/ was extracted to examine deviation of the formants from normal.</p> <p>Results</p> <p>There was 100% recognition accuracy obtained for Arabic digits spoken by normal speakers. However, there was a significant loss of accuracy in the classifications while spoken by voice disordered subjects. Moreover, no significant improvement in ASR performance was achieved after assessing a subset of the individuals with disordered voices who underwent treatment.</p> <p>Conclusion</p> <p>The results of this study revealed that the current ASR technique is not a reliable tool in recognizing the speech of dysphonic patients.</p

    An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification

    Get PDF
    Background and Objective Automatic voice-pathology detection and classification systems may help clinicians to detect the existence of any voice pathologies and the type of pathology from which patients suffer in the early stages. The main aim of this paper is to investigate Multidimensional Voice Program (MDVP) parameters to automatically detect and classify the voice pathologies in multiple databases, and then to find out which parameters performed well in these two processes. Materials and Methods Samples of the sustained vowel /a/ of normal and pathological voices were extracted from three different databases, which have three voice pathologies in common. The selected databases in this study represent three distinct languages: (1) the Arabic voice pathology database; (2) the Massachusetts Eye and Ear Infirmary database (English database); and (3) the Saarbruecken Voice Database (German database). A computerized speech lab program was used to extract MDVP parameters as features, and an acoustical analysis was performed. The Fisher discrimination ratio was applied to rank the parameters. A t test was performed to highlight any significant differences in the means of the normal and pathological samples. Results The experimental results demonstrate a clear difference in the performance of the MDVP parameters using these databases. The highly ranked parameters also differed from one database to another. The best accuracies were obtained by using the three highest ranked MDVP parameters arranged according to the Fisher discrimination ratio: these accuracies were 99.68%, 88.21%, and 72.53% for the Saarbruecken Voice Database, the Massachusetts Eye and Ear Infirmary database, and the Arabic voice pathology database, respectively

    Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions.

    Get PDF
    Automatic voice pathology detection and classification systems effectively contribute to the assessment of voice disorders, which helps clinicians to detect the existence of any voice pathologies and the type of pathology from which patients suffer in the early stages. This work concentrates on developing an accurate and robust feature extraction for detecting and classifying voice pathologies by investigating different frequency bands using correlation functions. In this paper, we extracted maximum peak values and their corresponding lag values from each frame of a voiced signal by using correlation functions as features to detect and classify pathological samples. These features are investigated in different frequency bands to see the contribution of each band on the detection and classification processes.Various samples of sustained vowel /a/ of normal and pathological voices were extracted from three different databases: English, German, and Arabic. A support vector machine was used as a classifier. We also performed a t test to investigate the significant differences in mean of normal and pathological samples.The best achieved accuracies in both detection and classification were varied depending on the band, the correlation function, and the database. The most contributive bands in both detection and classification were between 1000 and 8000 Hz. In detection, the highest acquired accuracies when using cross-correlation were 99.809%, 90.979%, and 91.168% in the Massachusetts Eye and Ear Infirmary, Saarbruecken Voice Database, and Arabic Voice Pathology Database databases, respectively. However, in classification, the highest acquired accuracies when using cross-correlation were 99.255%, 98.941%, and 95.188% in the three databases, respectively

    Exploring differences between phonetic classes in Sleep Apnoea Syndrome Patients using automatic speech processing techniques

    Get PDF
    This work is part of an on-going collaborative project between the medical and signal processing communities to promote new research efforts on automatic OSA (Obstructive Apnea Syndrome) diagnosis. In this paper, we explore the differences noted in phonetic classes (interphoneme) across groups (control/apnoea) and analyze their utility for OSA detectio

    Automated voice pathology discrimination from audio recordings benefits from phonetic analysis of continuous speech

    Get PDF
    In this paper we evaluate the hypothesis that automated methods for diagnosis of voice disorders from speech recordings would benefit from contextual information found in continuous speech. Rather than basing a diagnosis on how disorders affect the average acoustic properties of the speech signal, the idea is to exploit the possibility that different disorders will cause different acoustic changes within different phonetic contexts. Any differences in the pattern of effects across contexts would then provide additional information for discrimination of pathologies. We evaluate this approach using two complementary studies: the first uses a short phrase which is automatically annotated using a phonetic transcription, the second uses a long reading passage which is automatically annotated from text. The first study uses a single sentence recorded from 597 speakers in the Saarbrucken Voice Database to discriminate structural from neurogenic disorders. The results show that discrimination performance for these broad pathology classes improves from 59% to 67% unweighted average recall when classifiers are trained for each phone-label and the results fused. Although the phonetic contexts improved discrimination, the overall sensitivity and specificity of the method seems insufficient for clinical application. We hypothesise that this is because of the limited contexts in the speech audio and the heterogeneous nature of the disorders. In the second study we address these issues by processing recordings of a long reading passage obtained from clinical recordings of 60 speakers with either Spasmodic Dysphonia or Vocal fold Paralysis. We show that discrimination performance increases from 80% to 87% unweighted average recall if classifiers are trained for each phone-labelled region and predictions fused. We also show that the sensitivity and specificity of a diagnostic test with this performance is similar to other diagnostic procedures in clinical use. In conclusion, the studies confirm that the exploitation of contextual differences in the way disorders affect speech improves automated diagnostic performance, and that automated methods for phonetic annotation of reading passages are robust enough to extract useful diagnostic information

    Acoustic, myoelectric, and aerodynamic parameters of euphonic and dysphonic voices: a systematic review of clinical studies

    Get PDF
    At present, there is no clinical consensus on the concept of normal and dysphonic voices. For many years, the establishment of a consensus on the terminology related to normal and pathological voices has been studied, in order to facilitate the communication between professionals in the field of the voice. Aim: systematically review the literature to compare and learn more precisely the measurable and objective characteristics of the acoustic, aerodynamic and surface electromyographic parameters of the normal and dysphonic voices. Methods: The PRISMA 2020 methodology was used as a review protocol together with the PICO procedure to answer the research question through six databases. Results: In total, 467 articles were found. After duplicate records were removed from the selection, the inclusion and exclusion criteria were applied and 19 articles were eligible. A qualitative synthesis of the included studies is presented in terms of their methodology and results. Conclusions: Studying the acoustic, aerodynamic, and electromyographic parameters with more precision, in both normal and dysphonic voices, will allow health professionals working in the field of voice (speech therapy, otorhinolaryngology, phoniatrics, etc.) to establish a diagnostic and detailed consensus of the vocal pathology, enhancing the communication and generalization of results worldwide

    Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners

    Get PDF
    Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., multiple disorders might exist, demanding multi-label classification procedures for effective diagnoses. Consequently, the objective of this paper is to investigate the application of five multi-label classification methods based on problem transformation to play the role of base-learners, i.e., Label Powerset, Binary Relevance, Nested Stacking, Classifier Chains, and Dependent Binary Relevance with Random Forest (RF) and Support Vector Machine (SVM), in addition to a Deep Neural Network (DNN) from an algorithm adaptation method, to detect multiple voice disorders, i.e., Dysphonia, Laryngitis, Reinke's Edema, Vox Senilis, and Central Laryngeal Motion Disorder. Receiving as input three handcrafted features, i.e., signal energy (SE), zero-crossing rates (ZCRs), and signal entropy (SH), which allow for interpretable descriptors in terms of speech analysis, production, and perception, we observed that the DNN-based approach powered with SE-based feature vectors presented the best values of F1-score among the tested methods, i.e., 0.943, as the averaged value from all the balancing scenarios, under Saarbrücken Voice Database (SVD) and considering 20% of balancing rate with Synthetic Minority Over-sampling Technique (SMOTE). Finally, our findings of most false negatives for laryngitis may explain the reason why its detection is a serious issue in speech technology. The results we report provide an original contribution, allowing for the consistent detection of multiple speech pathologies and advancing the state-of-the-art in the field of handcrafted acoustic-based non-invasive diagnosis of voice disorders

    Towards vocal-behaviour and vocal-health assessment using distributions of acoustic parameters

    Get PDF
    Voice disorders at different levels are affecting those professional categories that make use of voice in a sustained way and for prolonged periods of time, the so-called occupational voice users. In-field voice monitoring is needed to investigate voice behaviour and vocal health status during everyday activities and to highlight work-related risk factors. The overall aim of this thesis is to contribute to the identification of tools, procedures and requirements related to the voice acoustic analysis as objective measure to prevent voice disorders, but also to assess them and furnish proof of outcomes during voice therapy. The first part of this thesis includes studies on vocal-load related parameters. Experiments were performed both in-field and in laboratory. A one-school year longitudinal study of teachers’ voice use during working hours was performed in high school classrooms using a voice analyzer equipped with a contact sensor; further measurements took place in the semi-anechoic and reverberant rooms of the National Institute of Metrological Research (I.N.Ri.M.) in Torino (Italy) for investigating the effects of very low and excessive reverberation in speech intensity, using both microphones in air and contact sensors. Within this framework, the contributions of the sound pressure level (SPL) uncertainty estimation using different devices were also assessed with proper experiments. Teachers adjusted their voice significantly with noise and reverberation, both at the beginning and at the end of the school year. Moreover, teachers who worked in the worst acoustic conditions showed higher SPLs and a worse vocal health status at the end of the school year. The minimum value of speech SPL was found for teachers in classrooms with a reverberation time of about 0.8 s. Participants involved into the in-laboratory experiments significantly increased their speech intensity of about 2.0 dB in the semi-anechoic room compared with the reverberant room, when describing a map. Such results are related to the speech monitorings performed with the vocal analyzer, whose uncertainty estimation for SPL differences resulted of about 1 dB. The second part of this thesis was addressed to vocal health and voice quality assessment using different speech materials and devices. Experiments were performed in clinics, in collaboration with the Department of Surgical Sciences of Università di Torino (Italy) and the Department of Clinical Science, Intervention and Technology of Karolinska Institutet in Stockholm (Sweden). Individual distributions of Cepstral Peak Prominence Smoothed (CPPS) from voluntary patients and control subjects were investigated in sustained vowels, reading, free speech and excerpted vowels from continuous speech, which were acquired with microphones in air and contact sensors. The main influence quantities of the estimated cepstral parameters were also identified, which are the fundamental frequency of the vocalization and the broadband noise superimposed to the signal. In addition, the reliability of CPPS estimation with respect to the frequency content of the vocal spectrum was evaluated, which is mainly dependent on the bandwidth of the measuring chain used to acquire the vocal signal. Regarding the speech materials acquired with the microphone in air, the 5th percentile resulted the best statistic for CPPS distributions that can discriminate healthy and unhealthy voices in sustained vowels, while the 95th percentile was the best in both reading and free speech tasks. The discrimination thresholds were 15 dB (95\% Confidence Interval, CI, of 0.7 dB) and 18 dB (95\% CI of 0.6 dB), respectively, where lower values indicate a high probability to have unhealthy voice. Preliminary outcomes on excerpted vowels from continuous speech stated that a CPPS mean value lower than 14 dB designates pathological voices. CPPS distributions were also effective as proof of outcomes after interventions, e.g. voice therapy and phonosurgery. Concerning the speech materials acquired with the electret contact sensor, a reasonable discrimination power was only obtained in the case of sustained vowel, where the standard deviation of CPPS distribution higher than 1.1 dB (95\% CI of 0.2 dB) indicates a high probability to have unhealthy voice. Further results indicated that a reliable estimation of CPPS parameters is obtained provided that the frequency content of the spectrum is not lower than 5 kHz: such outcome provides a guideline on the bandwidth of the measuring chain used to acquire the vocal signal

    Reviewing the connection between speech and obstructive sleep apnea

    Full text link
    The electronic version of this article is the complete one and can be found online at: http://link.springer.com/article/10.1186/s12938-016-0138-5Background: Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. Methods: A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea–hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients’ condition. We first evaluate AHI prediction using state-of-theart speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. Results: The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. Conclusion: The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.Authors thank to Sonia Martinez Diaz for her effort in collecting the OSA database that is used in this study. This research was partly supported by the Ministry of Economy and Competitiveness of Spain and the European Union (FEDER) under project "CMC-V2", TEC2012-37585-C02
    corecore