114 research outputs found
Feature Trajectory Dynamic Time Warping for Clustering of Speech Segments
Dynamic time warping (DTW) can be used to compute the similarity between two
sequences of generally differing length. We propose a modification to DTW that
performs individual and independent pairwise alignment of feature trajectories.
The modified technique, termed feature trajectory dynamic time warping (FTDTW),
is applied as a similarity measure in the agglomerative hierarchical clustering
of speech segments. Experiments using MFCC and PLP parametrisations extracted
from TIMIT and from the Spoken Arabic Digit Dataset (SADD) show consistent and
statistically significant improvements in the quality of the resulting clusters
in terms of F-measure and normalised mutual information (NMI).Comment: 10 page
Deep Transfer Learning based COVID-19 Detection in Cough, Breath and Speech using Bottleneck Features
We present an experimental investigation into the automatic detection of
COVID-19 from coughs, breaths and speech as this type of screening is
non-contact, does not require specialist medical expertise or laboratory
facilities and can easily be deployed on inexpensive consumer hardware.
Smartphone recordings of cough, breath and speech from subjects around the
globe are used for classification by seven standard machine learning
classifiers using leave--out cross-validation to provide a promising
baseline performance.
Then, a diverse dataset of 10.29 hours of cough, sneeze, speech and noise
audio recordings are used to pre-train a CNN, LSTM and Resnet50 classifier and
fine tuned the model to enhance the performance even further.
We have also extracted the bottleneck features from these pre-trained models
by removing the final-two layers and used them as an input to the LR, SVM, MLP
and KNN classifiers to detect COVID-19 signature.
The highest AUC of 0.98 was achieved using a transfer learning based Resnet50
architecture on coughs from Coswara dataset.
The highest AUC of 0.94 and 0.92 was achieved from an SVM run on the
bottleneck features extracted from the breaths from Coswara dataset and speech
recordings from ComParE dataset.
We conclude that among all vocal audio, coughs carry the strongest COVID-19
signature followed by breath and speech and using transfer learning improves
the classifier performance with higher AUC and lower variance across the
cross-validation folds.
Although these signatures are not perceivable by human ear, machine learning
based COVID-19 detection is possible from vocal audio recorded via smartphone
TB or not TB? Acoustic cough analysis for tuberculosis classification
In this work, we explore recurrent neural network architectures for
tuberculosis (TB) cough classification. In contrast to previous unsuccessful
attempts to implement deep architectures in this domain, we show that a basic
bidirectional long short-term memory network (BiLSTM) can achieve improved
performance. In addition, we show that by performing greedy feature selection
in conjunction with a newly-proposed attention-based architecture that learns
patient invariant features, substantially better generalisation can be achieved
compared to a baseline and other considered architectures. Furthermore, this
attention mechanism allows an inspection of the temporal regions of the audio
signal considered to be important for classification to be performed. Finally,
we develop a neural style transfer technique to infer idealised inputs which
can subsequently be analysed. We find distinct differences between the
idealised power spectra of TB and non-TB coughs, which provide clues about the
origin of the features in the audio signal.Comment: Accepted for publication at Interspeech 202
- …