140 research outputs found

    GMM-based classifiers for the automatic detection of obstructive sleep apnea

    Get PDF
    The aim of automatic pathological voice detection systems is to serve as tools, to medical specialists, for a more objective, less invasive and improved diagnosis of diseases. In this respect, the gold standard for those system include the usage of a optimized representation of the spectral envelope, either based on cepstral coefficients from the mel-scaled Fourier spectral envelope (Mel-Frequency Cepstral Coefficients) or from an all-pole estimation (Linear Prediction Coding Cepstral Coefficients) forcharacterization, and Gaussian Mixture Models for posterior classification. However, the study of recently proposed GMM-based classifiers as well as Nuisance mitigation techniques, such as those employed in speaker recognition, has not been widely considered inpathology detection labours. The present work aims at testing whether or not the employment of such speaker recognition tools might contribute to improve system performance in pathology detection systems, specifically in the automatic detection of Obstructive Sleep Apnea. The testing procedure employs an Obstructive Sleep Apnea database, in conjunction with GMM-based classifiers looking for a better performance. The results show that an improved performance might be obtained by using such approach

    Effects of audio compression in automatic detection of voice pathologies

    Get PDF
    This paper investigates the performance of an automatic system for voice pathology detection when the voice samples have been compressed in MP3 format and different binary rates (160, 96, 64, 48, 24, and 8 kb/s). The detectors employ cepstral and noise measurements, along with their derivatives, to characterize the voice signals. The classification is performed using Gaussian mixtures models and support vector machines. The results between the different proposed detectors are compared by means of detector error tradeoff (DET) and receiver operating characteristic (ROC) curves, concluding that there are no significant differences in the performance of the detector when the binary rates of the compressed data are above 64 kb/s. This has useful applications in telemedicine, reducing the storage space of voice recordings or transmitting them over narrow-band communications channels

    Dysphonia Detection based on modulation spectral features and cepstral coefficients

    Get PDF
    In this paper, we combine modulation spectral features with mel-frequency cepstral coefficients for automatic detection of dysphonia. For classification purposes, dimensions of the original modulation spectra are reduced using higher order singular value decomposition (HOSVD). Most relevant features are selected based on their mutual information to discrimination between normophonic and dysphonic speakers made by experts. Features that highly correlate with voice alterations are associated then with a support vector machine (SVM) classifier to provide an automatic decision. Recognition experiments using two different databases suggest that the system provides complementary information to the standard mel-cepstral feature

    A Parametric Approach for Classification of Distortions in Pathological Voices

    Get PDF
    In biomedical acoustics, distortion in voice signals, commonly present during acquisition and transmission, adversely affects acoustic features extracted from pathological voice. Information on the type of distortion can help in compensating for its effects. This paper proposes a new approach to detecting four major types of commonly encountered distortion in remote analysis of pathological voice, namely background noise, reverberation, clipping and coding. In this approach, by applying factor analysis to Gaussian mixture model mean supervectors, distortions in variable-duration recordings are modeled by fixed-length, low-dimensional channel vectors. Then, linear discriminant analysis (LDA) is used to remove the remaining nuisance effects in the channel vectors. Finally, two different classifiers, namely support vector machines and probabilistic LDA classify the different types of distortion. Experimental results obtained using Parkinson's voices, as an example of pathological voice, show 11.4% relative improvement in performance over systems which directly use acoustic features for distortion classification

    A survey on perceived speaker traits: personality, likability, pathology, and the first challenge

    Get PDF
    The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks

    An intelligent healthcare system for detection and classification to discriminate vocal fold disorders

    Get PDF
    The growing population of senior citizens around the world will appear as a big challenge in the future and they will engage a significant portion of the healthcare facilities. Therefore, it is necessary to develop intelligent healthcare systems so that they can be deployed in smart homes and cities for remote diagnosis. To overcome the problem, an intelligent healthcare system is proposed in this study. The proposed intelligent system is based on the human auditory mechanism and capable of detection and classification of various types of the vocal fold disorders. In the proposed system, critical bandwidth phenomena by using the bandpass filters spaced over Bark scale is implemented to simulate the human auditory mechanism. Therefore, the system acts like an expert clinician who can evaluate the voice of a patient by auditory perception. The experimental results show that the proposed system can detect the pathology with an accuracy of 99.72%. Moreover, the classification accuracy for vocal fold polyp, keratosis, vocal fold paralysis, vocal fold nodules, and adductor spasmodic dysphonia is 97.54%, 99.08%, 96.75%, 98.65%, 95.83%, and 95.83%, respectively. In addition, an experiment for paralysis versus all other disorders is also conducted, and an accuracy of 99.13% is achieved. The results show that the proposed system is accurate and reliable in vocal fold disorder assessment and can be deployed successfully for remote diagnosis. Moreover, the performance of the proposed system is better as compared to existing disorder assessment systems

    Intra- and Inter-database Study for Arabic, English, and German Databases:Do Conventional Speech Features Detect Voice Pathology?

    Get PDF
    A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection

    Improved Algorithm for Pathological and Normal Voices Identification

    Get PDF
    There are a lot of papers on automatic classification between normal and pathological voices, but they have the lack in the degree of severity estimation of the identified voice disorders. Building a model of pathological and normal voices identification, that can also evaluate the degree of severity of the identified voice disorders among students. In the present work, we present an automatic classifier using acoustical measurements on registered sustained vowels /a/ and pattern recognition tools based on neural networks. The training set was done by classifying students’ recorded voices based on threshold from the literature. We retrieve the pitch, jitter, shimmer and harmonic-to-noise ratio values of the speech utterance /a/, which constitute the input vector of the neural network. The degree of severity is estimated to evaluate how the parameters are far from the standard values based on the percent of normal and pathological values. In this work, the base data used for testing the proposed algorithm of the neural network is formed by healthy and pathological voices from German database of voice disorders. The performance of the proposed algorithm is evaluated in a term of the accuracy (97.9%), sensitivity (1.6%), and specificity (95.1%). The classification rate is 90% for normal class and 95% for pathological class
    corecore