8 research outputs found
Towards using Cough for Respiratory Disease Diagnosis by leveraging Artificial Intelligence: A Survey
Cough acoustics contain multitudes of vital information about
pathomorphological alterations in the respiratory system. Reliable and accurate
detection of cough events by investigating the underlying cough latent features
and disease diagnosis can play an indispensable role in revitalizing the
healthcare practices. The recent application of Artificial Intelligence (AI)
and advances of ubiquitous computing for respiratory disease prediction has
created an auspicious trend and myriad of future possibilities in the medical
domain. In particular, there is an expeditiously emerging trend of Machine
learning (ML) and Deep Learning (DL)-based diagnostic algorithms exploiting
cough signatures. The enormous body of literature on cough-based AI algorithms
demonstrate that these models can play a significant role for detecting the
onset of a specific respiratory disease. However, it is pertinent to collect
the information from all relevant studies in an exhaustive manner for the
medical experts and AI scientists to analyze the decisive role of AI/ML. This
survey offers a comprehensive overview of the cough data-driven ML/DL detection
and preliminary diagnosis frameworks, along with a detailed list of significant
features. We investigate the mechanism that causes cough and the latent cough
features of the respiratory modalities. We also analyze the customized cough
monitoring application, and their AI-powered recognition algorithms. Challenges
and prospective future research directions to develop practical, robust, and
ubiquitous solutions are also discussed in detail.Comment: 30 pages, 12 figures, 9 table
Cough Monitoring Through Audio Analysis
The detection of cough events in audio recordings requires the analysis of a significant amount of data as cough is typically monitored continuously over several hours to capture naturally occurring cough events. The recorded data is mostly composed of undesired sound events such as silence, background noise, and speech. To reduce computational costs and to address the ethical concerns raised from the collection of audio data in public environments, the data requires pre-processing prior to any further analysis.
Current cough detection algorithms typically use pre-processing methods to remove undesired audio segments from the collected data but do not preserve the privacy of individuals being recorded while monitoring respiratory events. This study reveals the need for an automatic pre-processing method that removes sensitive data from the recording prior to any further analysis to ensure privacy preservation of individuals.
Specific characteristics of cough sounds can be used to discard sensitive data from audio recordings at a pre-processing stage, improving privacy preservation, and decreasing ethical concerns when dealing with cough monitoring through audio analysis.
We propose a pre-processing algorithm that increases privacy preservation and significantly decreases the amount of data to be analysed, by separating cough segments from other non-cough segments, including speech, in audio recordings. Our method verifies the presence of signal energy in both lower and higher frequency regions and discards segments whose energy concentrates only on one of them. The method is iteratively applied on the same data to increase the percentage of data reduction and privacy preservation.
We evaluated the performance of our algorithm using several hours of audio recordings with manually pre-annotated cough and speech events. Our results showed that 5 iterations of the proposed method can discard up to 88.94% of the speech content present in the recordings, allowing for a strong privacy preservation while considerably reducing the amount of data to be further analysed by 91.79%.
The data reduction and privacy preservation achievements of the proposed pre-processing algorithm offers the possibility to use larger datasets captured in public environments and would beneficiate all cough detection algorithms by preserving the privacy of subjects and by-stander conversations recorded during cough monitoring
FSD50K: an Open Dataset of Human-Labeled Sound Events
Most existing datasets for sound event recognition (SER) are relatively small
and/or domain-specific, with the exception of AudioSet, based on a massive
amount of audio tracks from YouTube videos and encompassing over 500 classes of
everyday sounds. However, AudioSet is not an open dataset---its release
consists of pre-computed audio features (instead of waveforms), which limits
the adoption of some SER methods. Downloading the original audio tracks is also
problematic due to constituent YouTube videos gradually disappearing and usage
rights issues, which casts doubts over the suitability of this resource for
systems' benchmarking. To provide an alternative benchmark dataset and thus
foster SER research, we introduce FSD50K, an open dataset containing over 51k
audio clips totalling over 100h of audio manually labeled using 200 classes
drawn from the AudioSet Ontology. The audio clips are licensed under Creative
Commons licenses, making the dataset freely distributable (including
waveforms). We provide a detailed description of the FSD50K creation process,
tailored to the particularities of Freesound data, including challenges
encountered and solutions adopted. We include a comprehensive dataset
characterization along with discussion of limitations and key factors to allow
its audio-informed usage. Finally, we conduct sound event classification
experiments to provide baseline systems as well as insight on the main factors
to consider when splitting Freesound audio data for SER. Our goal is to develop
a dataset to be widely adopted by the community as a new open benchmark for SER
research
日常生活音からのリアルタイムADL 認識方法の研究
人間の行動や心情などを基にして,状況に応じて最適な制御ができるサービスが注目されているが,そのサービスを有用なものにするには,高次情報を得るためにセンサデータから得る低次情報が重要になる.そこで本研究では,ADL(日常生活行動)や心情などの把握を目的として,生活音や非言語音を話声や雑音と識別しながらリアルタイム認識ができるシステムの開発を行った.多種類の非言語音および生活音を対象としてリアルタイム認識を行った先行研究において,使われた認識手法によって,本研究で認識したい音声に対しても使えるかどうかについて検証した.その結果,「話声と非言語音が共存していないこと」や「雑音入力による誤検出対策が行われていない」という課題があり,さらにその手法が話声や非言語音の認識に向いていないという仮説を得た.そこで,「話声や雑音と識別するための手法」や「非言語音認識に適した状態定義手法」に関する既存研究について調査し,リアルタイム認識時の要件についても考慮した上で認識手法を提案した.さらに,提案手法に合った音声認識エンジンを用いてリアルタイム認識の実装を行うことにした.提案手法の認識精度を検証することを目的に,様々な話者や環境下での音声を使って3 種類の評価を行った.その結果,疑似音素列定義による非言語音同士での分類はそれなりの結果となったものの,連続音声からのリアルタイム認識を想定した処理を含めた場合,「非言語音の検出率」や「雑音入力による非言語音の誤検出」に関して課題が残った.その一方で話声による生活音および非言語音の誤検出は抑えることができたことに加えて,生活音については1 種類を除いて比較的正確な認識ができていた.また発話中に笑った場合でも,リアルタイム認識時と同様の設定で約65%の割合で笑いを検出することができたため,連続音声からのリアルタイム笑い声検出には本手法が有効になると考えた.電気通信大学201
Audio and contact microphones for cough detection
In the framework of assessing the pathology severity in chronic cough diseases, medical literature underlines the lack of tools for allowing the automatic, objective and reliable detection of cough events. This paper describes a system based on two microphones which we developed for this purpose. The proposed approach relies on a large variety of audio descriptors, an efficient algorithm of feature selection based on their mutual information and the use of artificial neural networks. First, the possible use of a contact microphone (placed on the patient's thorax or trachea) in complement to the audio signal is investigated. This study underlines that this contact microphone suffers from reliability issues, and conveys little new relevant information compared to the audio modality. Secondly, the proposed audioonly approach is compared to a commercially available system using four sensors on a database with different sound categories often misdetected as coughs, and produced in various conditions. With average sensitivity and specificity of 94.7% and 95% respectively, the proposed method achieves better cough detection performance than the commercial system