961 research outputs found

    Laying the Foundation for In-car Alcohol Detection by Speech

    Get PDF
    The fact that an increasing number of functions in the automobile are and will be controlled by speech of the driver rises the question whether this speech input may be used to detect a possible alcoholic intoxication of the driver. For that matter a large part of the new Alcohol Language Corpus (ALC) edited by the Bavarian Archive of Speech Signals (BAS) will be used for a broad statistical investigation of possible feature candidates for classification. In this contribution we present the motivation and the design of the ALC corpus as well as first results from fundamental frequency and rhythm analysis. Our analysis by comparing sober and alcoholized speech of the same individuals suggests that there are in fact promising features that can automatically be derived from the speech signal during the speech recognition process and will indicate intoxication for most speakers

    Speech processing with deep learning for voice-based respiratory diagnosis : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand

    Get PDF
    Voice-based respiratory diagnosis research aims at automatically screening and diagnosing respiratory-related symptoms (e.g., smoking status, COVID-19 infection) from human-generated sounds (e.g., breath, cough, speech). It has the potential to be used as an objective, simple, reliable, and less time-consuming method than traditional biomedical diagnosis methods. In this thesis, we conduct one comprehensive literature review and propose three novel deep learning methods to enrich voice-based respiratory diagnosis research and improve its performance. Firstly, we conduct a comprehensive investigation of the effects of voice features on the detection of smoking status. Secondly, we propose a novel method that uses the combination of both high-level and low-level acoustic features along with deep neural networks for smoking status identification. Thirdly, we investigate various feature extraction/representation methods and propose a SincNet-based CNN method for feature representations to further improve the performance of smoking status identification. To the best of our knowledge, this is the first systemic study that applies speech processing with deep learning for voice-based smoking status identification. Moreover, we propose a novel transfer learning scheme and a task-driven feature representation method for diagnosing respiratory diseases (e.g., COVID-19) from human-generated sounds. We find those transfer learning methods using VGGish, wav2vec 2.0 and PASE+, and our proposed task-driven method Sinc-ResNet have achieved competitive performance compared with other work. The findings of this study provide a new perspective and insights for voice-based respiratory disease diagnosis. The experimental results demonstrate the effectiveness of our proposed methods and show that they have achieved better performances compared to other existing methods

    Alcohol Language Corpus

    Get PDF
    The Alcohol Language Corpus (ALC) is the first publicly available speech corpus comprising intoxicated and sober speech of 162 female and male German speakers. Recordings are done in the automotive environment to allow for the development of automatic alcohol detection and to ensure a consistent acoustic environment for the alcoholized and the sober recording. The recorded speech covers a variety of contents and speech styles. Breath and blood alcohol concentration measurements are provided for all speakers. A transcription according to SpeechDat/Verbmobil standards and disfluency tagging as well as an automatic phonetic segmentation are part of the corpus. An Emu version of ALC allows easy access to basic speech parameters as well as the us of R for statistical analysis of selected parts of ALC. ALC is available without restriction for scientific or commercial use at the Bavarian Archive for Speech Signals

    Alcohol Language Corpus

    Get PDF
    The Alcohol Language Corpus (ALC) is the first publicly available speech corpus comprising intoxicated and sober speech of 162 female and male German speakers. Recordings are done in the automotive environment to allow for the development of automatic alcohol detection and to ensure a consistent acoustic environment for the alcoholized and the sober recording. The recorded speech covers a variety of contents and speech styles. Breath and blood alcohol concentration measurements are provided for all speakers. A transcription according to SpeechDat/Verbmobil standards and disfluency tagging as well as an automatic phonetic segmentation are part of the corpus. An Emu version of ALC allows easy access to basic speech parameters as well as the us of R for statistical analysis of selected parts of ALC. ALC is available without restriction for scientific or commercial use at the Bavarian Archive for Speech Signals

    A Parametric Approach for Classification of Distortions in Pathological Voices

    Get PDF
    In biomedical acoustics, distortion in voice signals, commonly present during acquisition and transmission, adversely affects acoustic features extracted from pathological voice. Information on the type of distortion can help in compensating for its effects. This paper proposes a new approach to detecting four major types of commonly encountered distortion in remote analysis of pathological voice, namely background noise, reverberation, clipping and coding. In this approach, by applying factor analysis to Gaussian mixture model mean supervectors, distortions in variable-duration recordings are modeled by fixed-length, low-dimensional channel vectors. Then, linear discriminant analysis (LDA) is used to remove the remaining nuisance effects in the channel vectors. Finally, two different classifiers, namely support vector machines and probabilistic LDA classify the different types of distortion. Experimental results obtained using Parkinson's voices, as an example of pathological voice, show 11.4% relative improvement in performance over systems which directly use acoustic features for distortion classification

    About Voice: A Longitudinal Study of Speaker Recognition Dataset Dynamics

    Full text link
    Like face recognition, speaker recognition is widely used for voice-based biometric identification in a broad range of industries, including banking, education, recruitment, immigration, law enforcement, healthcare, and well-being. However, while dataset evaluations and audits have improved data practices in computer vision and face recognition, the data practices in speaker recognition have gone largely unquestioned. Our research aims to address this gap by exploring how dataset usage has evolved over time and what implications this has on bias and fairness in speaker recognition systems. Previous studies have demonstrated the presence of historical, representation, and measurement biases in popular speaker recognition benchmarks. In this paper, we present a longitudinal study of speaker recognition datasets used for training and evaluation from 2012 to 2021. We survey close to 700 papers to investigate community adoption of datasets and changes in usage over a crucial time period where speaker recognition approaches transitioned to the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field, examines their usage patterns, and assesses their attributes that affect bias, fairness, and other ethical concerns. Our findings suggest areas for further research on the ethics and fairness of speaker recognition technology.Comment: 14 pages (23 with References and Appendix

    Towards using Cough for Respiratory Disease Diagnosis by leveraging Artificial Intelligence: A Survey

    Full text link
    Cough acoustics contain multitudes of vital information about pathomorphological alterations in the respiratory system. Reliable and accurate detection of cough events by investigating the underlying cough latent features and disease diagnosis can play an indispensable role in revitalizing the healthcare practices. The recent application of Artificial Intelligence (AI) and advances of ubiquitous computing for respiratory disease prediction has created an auspicious trend and myriad of future possibilities in the medical domain. In particular, there is an expeditiously emerging trend of Machine learning (ML) and Deep Learning (DL)-based diagnostic algorithms exploiting cough signatures. The enormous body of literature on cough-based AI algorithms demonstrate that these models can play a significant role for detecting the onset of a specific respiratory disease. However, it is pertinent to collect the information from all relevant studies in an exhaustive manner for the medical experts and AI scientists to analyze the decisive role of AI/ML. This survey offers a comprehensive overview of the cough data-driven ML/DL detection and preliminary diagnosis frameworks, along with a detailed list of significant features. We investigate the mechanism that causes cough and the latent cough features of the respiratory modalities. We also analyze the customized cough monitoring application, and their AI-powered recognition algorithms. Challenges and prospective future research directions to develop practical, robust, and ubiquitous solutions are also discussed in detail.Comment: 30 pages, 12 figures, 9 table
    corecore