961 research outputs found
Laying the Foundation for In-car Alcohol Detection by Speech
The fact that an increasing number of functions in the automobile are and will be controlled by speech of the driver rises the question whether this speech input may be used to detect a possible alcoholic intoxication of the driver. For that matter a large part of the new Alcohol Language Corpus (ALC) edited by the Bavarian Archive of Speech Signals (BAS) will be used for a broad statistical investigation of possible feature candidates for classification. In this contribution we present the motivation and the design of the ALC corpus as well as first results from fundamental
frequency and rhythm analysis. Our analysis by comparing
sober and alcoholized speech of the same individuals suggests that there are in fact promising features that can automatically be derived from the speech signal during the speech recognition process and will indicate intoxication for most speakers
Speech processing with deep learning for voice-based respiratory diagnosis : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand
Voice-based respiratory diagnosis research aims at automatically screening and diagnosing respiratory-related symptoms (e.g., smoking status, COVID-19 infection) from human-generated sounds (e.g., breath, cough, speech). It has the potential to be used as an objective, simple, reliable, and less time-consuming method than traditional biomedical diagnosis methods. In this thesis, we conduct one comprehensive literature review and propose three novel deep learning methods to enrich voice-based respiratory diagnosis research and improve its performance.
Firstly, we conduct a comprehensive investigation of the effects of voice features on the detection of smoking status. Secondly, we propose a novel method that uses the combination of both high-level and low-level acoustic features along with deep neural networks for smoking status identification. Thirdly, we investigate various feature extraction/representation methods and propose a SincNet-based CNN method for feature representations to further improve the performance of smoking status identification. To the best of our knowledge, this is the first systemic study that applies speech processing with deep learning for voice-based smoking status identification.
Moreover, we propose a novel transfer learning scheme and a task-driven feature representation method for diagnosing respiratory diseases (e.g., COVID-19) from human-generated sounds. We find those transfer learning methods using VGGish, wav2vec 2.0 and PASE+, and our proposed task-driven method Sinc-ResNet have achieved competitive performance compared with other work. The findings of this study provide a new perspective and insights for voice-based respiratory disease diagnosis.
The experimental results demonstrate the effectiveness of our proposed methods and show that they have achieved better performances compared to other existing methods
Alcohol Language Corpus
The Alcohol Language Corpus (ALC) is the first publicly available speech corpus comprising intoxicated and sober speech of 162 female and male German speakers.
Recordings are done in the automotive environment to allow for the development of automatic alcohol detection and to ensure a consistent acoustic environment for the alcoholized and the sober recording. The recorded speech covers a variety of contents and speech styles. Breath and blood alcohol concentration measurements are provided for all speakers. A transcription according to SpeechDat/Verbmobil standards and disfluency tagging as well as an automatic phonetic segmentation are part of the corpus. An Emu version of ALC allows easy access to basic speech parameters as well as the us of R for statistical analysis of selected parts of ALC. ALC is available without restriction for scientific or commercial use at the Bavarian Archive for Speech Signals
Alcohol Language Corpus
The Alcohol Language Corpus (ALC) is the first publicly available speech corpus comprising intoxicated and sober speech of 162 female and male German speakers.
Recordings are done in the automotive environment to allow for the development of automatic alcohol detection and to ensure a consistent acoustic environment for the alcoholized and the sober recording. The recorded speech covers a variety of contents and speech styles. Breath and blood alcohol concentration measurements are provided for all speakers. A transcription according to SpeechDat/Verbmobil standards and disfluency tagging as well as an automatic phonetic segmentation are part of the corpus. An Emu version of ALC allows easy access to basic speech parameters as well as the us of R for statistical analysis of selected parts of ALC. ALC is available without restriction for scientific or commercial use at the Bavarian Archive for Speech Signals
A Parametric Approach for Classification of Distortions in Pathological Voices
In biomedical acoustics, distortion in voice signals, commonly present during acquisition and transmission, adversely affects acoustic features extracted from pathological voice. Information on the type of distortion can help in compensating for its effects. This paper proposes a new approach to detecting four major types of commonly encountered distortion in remote analysis of pathological voice, namely background noise, reverberation, clipping and coding. In this approach, by applying factor analysis to Gaussian mixture model mean supervectors, distortions in variable-duration recordings are modeled by fixed-length, low-dimensional channel vectors. Then, linear discriminant analysis (LDA) is used to remove the remaining nuisance effects in the channel vectors. Finally, two different classifiers, namely support vector machines and probabilistic LDA classify the different types of distortion. Experimental results obtained using Parkinson's voices, as an example of pathological voice, show 11.4% relative improvement in performance over systems which directly use acoustic features for distortion classification
About Voice: A Longitudinal Study of Speaker Recognition Dataset Dynamics
Like face recognition, speaker recognition is widely used for voice-based
biometric identification in a broad range of industries, including banking,
education, recruitment, immigration, law enforcement, healthcare, and
well-being. However, while dataset evaluations and audits have improved data
practices in computer vision and face recognition, the data practices in
speaker recognition have gone largely unquestioned. Our research aims to
address this gap by exploring how dataset usage has evolved over time and what
implications this has on bias and fairness in speaker recognition systems.
Previous studies have demonstrated the presence of historical, representation,
and measurement biases in popular speaker recognition benchmarks. In this
paper, we present a longitudinal study of speaker recognition datasets used for
training and evaluation from 2012 to 2021. We survey close to 700 papers to
investigate community adoption of datasets and changes in usage over a crucial
time period where speaker recognition approaches transitioned to the widespread
adoption of deep neural networks. Our study identifies the most commonly used
datasets in the field, examines their usage patterns, and assesses their
attributes that affect bias, fairness, and other ethical concerns. Our findings
suggest areas for further research on the ethics and fairness of speaker
recognition technology.Comment: 14 pages (23 with References and Appendix
Towards using Cough for Respiratory Disease Diagnosis by leveraging Artificial Intelligence: A Survey
Cough acoustics contain multitudes of vital information about
pathomorphological alterations in the respiratory system. Reliable and accurate
detection of cough events by investigating the underlying cough latent features
and disease diagnosis can play an indispensable role in revitalizing the
healthcare practices. The recent application of Artificial Intelligence (AI)
and advances of ubiquitous computing for respiratory disease prediction has
created an auspicious trend and myriad of future possibilities in the medical
domain. In particular, there is an expeditiously emerging trend of Machine
learning (ML) and Deep Learning (DL)-based diagnostic algorithms exploiting
cough signatures. The enormous body of literature on cough-based AI algorithms
demonstrate that these models can play a significant role for detecting the
onset of a specific respiratory disease. However, it is pertinent to collect
the information from all relevant studies in an exhaustive manner for the
medical experts and AI scientists to analyze the decisive role of AI/ML. This
survey offers a comprehensive overview of the cough data-driven ML/DL detection
and preliminary diagnosis frameworks, along with a detailed list of significant
features. We investigate the mechanism that causes cough and the latent cough
features of the respiratory modalities. We also analyze the customized cough
monitoring application, and their AI-powered recognition algorithms. Challenges
and prospective future research directions to develop practical, robust, and
ubiquitous solutions are also discussed in detail.Comment: 30 pages, 12 figures, 9 table
- …