1,608 research outputs found

    Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease

    Get PDF
    There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Perceptual and acoustic assessment of a child’s speech before and after laryngeal web surgery

    Get PDF
    The aim of this paper was to point to the importance of early diagnostics and surgery in patients with laryngeal web in order to achieve normal breathing, as well as to stress the need for an interdisciplinary approach to observing the quality of voice and prosodic features at an early age. The subject under consideration was a 6.5-year-old girl who had previously been diagnosed with irregular breathing (R06). An endoscopic exam revealed a laryngeal web between the vocal folds and the fact that the posterior intercartilaginous section of the glottis of the child’s larynx was in order (normal). The child’s speech had been recorded in the acoustic studio, both before and after the vocal-fold surgery (six and twelve months later). Due to severe dysphonia, difficulties with breathing, and frequent noisy breathing (stridor), we recorded only the phonation of the vowel [a], as well as spontaneous speech before the surgery. In addition, there was intense glottic and supraglottic strain before the surgery, which in phonetics corresponds to the term laryngeal and supralaryngeal strain and pathologically creaky whispery phonation (according to VPA protocol). This strain was visible in the area of the chest, neck, and head, as well as audible in the voice quality. Acoustic analysis showed that the average F0 for the vowel [a] was remarkably high (442 Hz), and the pathological values were established using the following measures: local jitter (1.68%), local shimmer (0.7 dB), and the harmonic to noise ratio (17.6 dB). In contrast, six months after the surgery, the pitch for [a] was half the value of the preoperative one (220.5 Hz, p < 0.001), and the local jitter for all vowels (0.30-0.47%) and the harmonic to noise ratio (22.46 dB, p = 0.05) was within the normal range. There was also significant improvement in the F0 values, standard deviation of F0, and minimum and maximum F0 values. The average and median F0 values in spontaneous speech were also lower postoperatively. The voice quality showed a more balanced timbre (LTASS), particularly after one year. Some other prosodic features also showed improvement

    REPA: Client Clustering without Training and Data Labels for Improved Federated Learning in Non-IID Settings

    Full text link
    Clustering clients into groups that exhibit relatively homogeneous data distributions represents one of the major means of improving the performance of federated learning (FL) in non-independent and identically distributed (non-IID) data settings. Yet, the applicability of current state-of-the-art approaches remains limited as these approaches cluster clients based on information, such as the evolution of local model parameters, that is only obtainable through actual on-client training. On the other hand, there is a need to make FL models available to clients who are not able to perform the training themselves, as they do not have the processing capabilities required for training, or simply want to use the model without participating in the training. Furthermore, the existing alternative approaches that avert the training still require that individual clients have a sufficient amount of labeled data upon which the clustering is based, essentially assuming that each client is a data annotator. In this paper, we present REPA, an approach to client clustering in non-IID FL settings that requires neither training nor labeled data collection. REPA uses a novel supervised autoencoder-based method to create embeddings that profile a client's underlying data-generating processes without exposing the data to the server and without requiring local training. Our experimental analysis over three different datasets demonstrates that REPA delivers state-of-the-art model performance while expanding the applicability of cluster-based FL to previously uncovered use cases

    A Robust Voice Pathology Detection System Based on the Combined BiLSTM–CNN Architecture

    Get PDF
    Voice recognition systems have become increasingly important in recent years due to the growing need for more efficient and intuitive human-machine interfaces. The use of Hybrid LSTM networks and deep learning has been very successful in improving speech detection systems. The aim of this paper is to develop a novel approach for the detection of voice pathologies using a hybrid deep learning model that combines the Bidirectional Long Short-Term Memory (BiLSTM) and the Convolutional Neural Network (CNN) architectures. The proposed model uses a combination of temporal and spectral features extracted from speech signals to detect the different types of voice pathologies. The performance of the proposed detection model is evaluated on a publicly available dataset of speech signals from individuals with various voice pathologies(MEEI database). The experimental results showed that the hybrid BiLSTM-CNN model outperforms several classifiers by achieving an accuracy of 98.86\%. The proposed model has the potential to assist health care professionals in the accurate diagnosis and treatment of voice pathologies, and improving the quality of life for affected individuals
    • …
    corecore