806 research outputs found

    An ANN-based Method for Detecting Vocal Fold Pathology

    Full text link
    There are different algorithms for vocal fold pathology diagnosis. These algorithms usually have three stages which are Feature Extraction, Feature Reduction and Classification. While the third stage implies a choice of a variety of machine learning methods, the first and second stages play a critical role in performance and accuracy of the classification system. In this paper we present initial study of feature extraction and feature reduction in the task of vocal fold pathology diagnosis. A new type of feature vector, based on wavelet packet decomposition and Mel-Frequency-Cepstral-Coefficients (MFCCs), is proposed. Also Principal Component Analysis (PCA) is used for feature reduction. An Artificial Neural Network is used as a classifier for evaluating the performance of our proposed method.Comment: 4 pages, 3 figures, Published with International Journal of Computer Applications (IJCA

    Feature selection in pathological voice classification using dinamyc of component analysis

    Get PDF
    This paper presents a methodology for the reduction of the training space based on the analysis of the variation of the linear components of the acoustic features. The methodology is applied to the automatic detection of voice disorders by means of stochastic dynamic models. The acoustic features used to model the speech are: MFCC, HNR, GNE, NNE and the energy envelopes. The feature extraction is carried out by means of PCA, and classification is done using discrete and continuous HMMs. The results showed a direct relationship between the principal directions (feature weights) and the classification performance. The dynamic feature analysis by means of PCA reduces the dimension of the original feature space while the topological complexity of the dynamic classifier remains unchanged. The experiments were tested with Kay Elemetrics (DB1) and UPM (DB2) databases. Results showed 91% of accuracy with 30% of computational cost reduction for DB1

    Analysis of the influence of signal compression techniques for voice disorder detection through filter-banked based features

    Get PDF
    En este artículo se comparan los resultados de utilizar señales de voz comprimidas frente a señales de voz sin comprimir para detectar de forma automática anomalías vocales. Las técnicas de codificación y compresión de voz usadas en este estudio son las mismas que se utilizan de forma estándar en los sistemas de telefonía fija, móvil e IP, y las técnicas de caracterización y clasificación usadas también están dentro de las más utilizadas para la detección automática de anomalías de voz. Los resultados obtenidos permiten concluir que es posible utilizar señales de voz comprimidas para detección automática de patologías vocales sin detrimento en el porcentaje de acierto en el diagnóstico, lo que haría posible la implementación de sistemas de telediagnóstico automático de patologías vocales.This paper compares the results of using compressed voice signals versus uncompressed speech signals to automatically detect voice abnormalities. Coding techniques and voice compression used in this study are the same as those used by default in the fixed, mobile and ip telephony systems, and techniques of characterization and classification used are also among the most used for detecting automatic speech abnormalities. The results obtained indicate that it is possible to use compressed voice signals for automatic detection of vocal pathologies without compromising the success rate in the diagnosis, which would make the implementation of automatic remote diagnosis of vocal pathologies possible

    A Survey on Signal Processing Based Pathological Voice Detection Techniques

    Get PDF
    Voice disability is a barrier to effective communication. Around 1.2% of the World\u27s population is facing some form of voice disability. Surgical procedures namely laryngoscopy, laryngeal electromyography, and stroboscopy are used for voice disability diagnosis. Researchers and practitioners have been working to find alternatives to these surgical procedures. Voice sample based diagnosis is one of them. The major steps followed by these works are (a) to extract voice features from voice samples and (b) to discriminate pathological voices from normal voices by using a classifier algorithm. However, there is no consensus about the voice feature and the classifier algorithm that can provide the best accuracy in screening voice disability. Moreover, some of the works use multiple voice features and multiple classifiers to ensure high reliability. In this paper, we address these issues. The motivation of the work is to address the need for non-invasive signal processing techniques to detect voice disability in the general population. This paper conducts a survey related to voice disability detection methods. The paper contains two main parts. In the first part, we present background information including causes of voice disability, current procedures and practices, voice features, and classifiers. In the second part, we present a comprehensive survey work on voice disability detection algorithms. The issues and challenges related to the selection of voice feature and classifier algorithms have been addressed at the end of this paper

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Exploring the impact of data poisoning attacks on machine learning model reliability

    Get PDF
    Recent years have seen the widespread adoption of Artificial Intelligence techniques in several domains, including healthcare, justice, assisted driving and Natural Language Processing (NLP) based applications (e.g., the Fake News detection). Those mentioned are just a few examples of some domains that are particularly critical and sensitive to the reliability of the adopted machine learning systems. Therefore, several Artificial Intelligence approaches were adopted as support to realize easy and reliable solutions aimed at improving the early diagnosis, personalized treatment, remote patient monitoring and better decision-making with a consequent reduction of healthcare costs. Recent studies have shown that these techniques are venerable to attacks by adversaries at phases of artificial intelligence. Poisoned data set are the most common attack to the reliability of Artificial Intelligence approaches. Noise, for example, can have a significant impact on the overall performance of a machine learning model. This study discusses the strength of impact of noise on classification algorithms. In detail, the reliability of several machine learning techniques to distinguish correctly pathological and healthy voices by analysing poisoning data was evaluated. Voice samples selected by available database, widely used in research sector, the Saarbruecken Voice Database, were processed and analysed to evaluate the resilience and classification accuracy of these techniques. All analyses are evaluated in terms of accuracy, specificity, sensitivity, F1-score and ROC area

    Reviewing the connection between speech and obstructive sleep apnea

    Full text link
    The electronic version of this article is the complete one and can be found online at: http://link.springer.com/article/10.1186/s12938-016-0138-5Background: Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. Methods: A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea–hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients’ condition. We first evaluate AHI prediction using state-of-theart speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. Results: The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. Conclusion: The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.Authors thank to Sonia Martinez Diaz for her effort in collecting the OSA database that is used in this study. This research was partly supported by the Ministry of Economy and Competitiveness of Spain and the European Union (FEDER) under project "CMC-V2", TEC2012-37585-C02
    corecore