335 research outputs found

    Evaluating the impact of voice activity detection on speech emotion recognition for autistic children

    Get PDF
    Individuals with autism are known to face challenges with emotion regulation, and express their affective states in a variety of ways. With this in mind, an increasing amount of research on automatic affect recognition from speech and other modalities has recently been presented to assist and provide support, as well as to improve understanding of autistic individuals' behaviours. As well as the emotion expressed from the voice, for autistic children the dynamics of verbal speech can be inconsistent and vary greatly amongst individuals. The current contribution outlines a voice activity detection (VAD) system specifically adapted to autistic children's vocalisations. The presented VAD system is a recurrent neural network (RNN) with long short-term memory (LSTM) cells. It is trained on 130 acoustic Low-Level Descriptors (LLDs) extracted from more than 17 h of audio recordings, which were richly annotated by experts in terms of perceived emotion as well as occurrence and type of vocalisations. The data consist of 25 English-speaking autistic children undertaking a structured, partly robot-assisted emotion-training activity and was collected as part of the DE-ENIGMA project. The VAD system is further utilised as a preprocessing step for a continuous speech emotion recognition (SER) task aiming to minimise the effects of potential confounding information, such as noise, silence, or non-child vocalisation. Its impact on the SER performance is compared to the impact of other VAD systems, including a general VAD system trained from the same data set, an out-of-the-box Web Real-Time Communication (WebRTC) VAD system, as well as the expert annotations. Our experiments show that the child VAD system achieves a lower performance than our general VAD system, trained under identical conditions, as we obtain receiver operating characteristic area under the curve (ROC-AUC) metrics of 0.662 and 0.850, respectively. The SER results show varying performances across valence and arousal depending on the utilised VAD system with a maximum concordance correlation coefficient (CCC) of 0.263 and a minimum root mean square error (RMSE) of 0.107. Although the performance of the SER models is generally low, the child VAD system can lead to slightly improved results compared to other VAD systems and in particular the VAD-less baseline, supporting the hypothesised importance of child VAD systems in the discussed context

    Frontotemporal Dementia and Glucose Metabolism

    Get PDF
    Frontotemporal dementia (FTD), hallmarked by antero-temporal degeneration in the human brain, is the second most common early onset dementia. FTD is a diverse disease with three main clinical presentations, four different identified proteinopathies and many disease-associated genes. The exact pathophysiology of FTD remains to be elucidated. One common characteristic all forms of FTD share is the dysregulation of glucose metabolism in patients’ brains. The brain consumes around 20% of the body’s energy supply and predominantly utilizes glucose as a fuel. Glucose metabolism dysregulation could therefore be extremely detrimental for neuronal health. Research into the association between glucose metabolism and dementias has recently gained interest in Alzheimer’s disease. FTD also presents with glucose metabolism dysregulation, however, this remains largely an unexplored area. A better understanding of the link between FTD and glucose metabolism may yield further insight into FTD pathophysiology and aid the development of novel therapeutics. Here we review our current understanding of FTD and glucose metabolism in the brain and discuss the evidence of impaired glucose metabolism in FTD. Lastly, we review research potentially suggesting a causal relationship between FTD proteinopathies and impaired glucose metabolism in FTD

    Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data

    Get PDF
    Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database. However, speech synthesis and voice conversion paradigms that are not considered in the ASVspoof2015 database are appearing. Such examples include direct waveform modelling and generative adversarial networks. We also need to investigate the feasibility of training spoofing systems using only low-quality found data. For that purpose, we developed a generative adversarial network-based speech enhancement system that improves the quality of speech data found in publicly available sources. Using the enhanced data, we trained state-of-the-art text-to-speech and voice conversion models and evaluated them in terms of perceptual speech quality and speaker similarity. The results show that the enhancement models significantly improved the SNR of low-quality degraded data found in publicly available sources and that they significantly improved the perceptual cleanliness of the source speech without significantly degrading the naturalness of the voice. However, the results also show limitations when generating speech with the low-quality found data.Comment: conference manuscript submitted to Speaker Odyssey 201

    Data Augmentation for Deep-Learning-Based Electroencephalography

    Get PDF
    Background: Data augmentation (DA) has recently been demonstrated to achieve considerable performance gains for deep learning (DL)—increased accuracy and stability and reduced overfitting. Some electroencephalography (EEG) tasks suffer from low samples-to-features ratio, severely reducing DL effectiveness. DA with DL thus holds transformative promise for EEG processing, possibly like DL revolutionized computer vision, etc. New method: We review trends and approaches to DA for DL in EEG to address: Which DA approaches exist and are common for which EEG tasks? What input features are used? And, what kind of accuracy gain can be expected? Results: DA for DL on EEG begun 5 years ago and is steadily used more. We grouped DA techniques (noise addition, generative adversarial networks, sliding windows, sampling, Fourier transform, recombination of segmentation, and others) and EEG tasks (into seizure detection, sleep stages, motor imagery, mental workload, emotion recognition, motor tasks, and visual tasks). DA efficacy across techniques varied considerably. Noise addition and sliding windows provided the highest accuracy boost; mental workload most benefitted from DA. Sliding window, noise addition, and sampling methods most common for seizure detection, mental workload, and sleep stages, respectively. Comparing with existing methods: Percent of decoding accuracy explained by DA beyond unaugmented accuracy varied between 8% for recombination of segmentation and 36% for noise addition and from 14% for motor imagery to 56% for mental workload—29% on average. Conclusions: DA increasingly used and considerably improved DL decoding accuracy on EEG. Additional publications—if adhering to our reporting guidelines—will facilitate more detailed analysis

    Data Augmentation for Deep-Learning-Based Electroencephalography

    Get PDF
    Background: Data augmentation (DA) has recently been demonstrated to achieve considerable performance gains for deep learning (DL)—increased accuracy and stability and reduced overfitting. Some electroencephalography (EEG) tasks suffer from low samples-to-features ratio, severely reducing DL effectiveness. DA with DL thus holds transformative promise for EEG processing, possibly like DL revolutionized computer vision, etc. New method: We review trends and approaches to DA for DL in EEG to address: Which DA approaches exist and are common for which EEG tasks? What input features are used? And, what kind of accuracy gain can be expected? Results: DA for DL on EEG begun 5 years ago and is steadily used more. We grouped DA techniques (noise addition, generative adversarial networks, sliding windows, sampling, Fourier transform, recombination of segmentation, and others) and EEG tasks (into seizure detection, sleep stages, motor imagery, mental workload, emotion recognition, motor tasks, and visual tasks). DA efficacy across techniques varied considerably. Noise addition and sliding windows provided the highest accuracy boost; mental workload most benefitted from DA. Sliding window, noise addition, and sampling methods most common for seizure detection, mental workload, and sleep stages, respectively. Comparing with existing methods: Percent of decoding accuracy explained by DA beyond unaugmented accuracy varied between 8% for recombination of segmentation and 36% for noise addition and from 14% for motor imagery to 56% for mental workload—29% on average. Conclusions: DA increasingly used and considerably improved DL decoding accuracy on EEG. Additional publications—if adhering to our reporting guidelines—will facilitate more detailed analysis
    • …
    corecore