13,779 research outputs found
Speech-based recognition of self-reported and observed emotion in a dimensional space
The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two types of ratings affect the development and performance of automatic emotion recognizers developed with these ratings. A dimensional approach to emotion modeling is adopted: the ratings are based on continuous arousal and valence scales. We describe the TNO-Gaming Corpus that contains spontaneous vocal and facial expressions elicited via a multiplayer videogame and that includes emotion annotations obtained via self-report and observation by outside observers. Comparisons show that there are discrepancies between self-reported and observed emotion ratings which are also reflected in the performance of the emotion recognizers developed. Using Support Vector Regression in combination with acoustic and textual features, recognizers of arousal and valence are developed that can predict points in a 2-dimensional arousal-valence space. The results of these recognizers show that the self-reported emotion is much harder to recognize than the observed emotion, and that averaging ratings from multiple observers improves performance
Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge
We envision a mobile edge computing (MEC) framework for machine learning (ML)
technologies, which leverages distributed client data and computation resources
for training high-performance ML models while preserving client privacy. Toward
this future goal, this work aims to extend Federated Learning (FL), a
decentralized learning framework that enables privacy-preserving training of
models, to work with heterogeneous clients in a practical cellular network. The
FL protocol iteratively asks random clients to download a trainable model from
a server, update it with own data, and upload the updated model to the server,
while asking the server to aggregate multiple client updates to further improve
the model. While clients in this protocol are free from disclosing own private
data, the overall training process can become inefficient when some clients are
with limited computational resources (i.e. requiring longer update time) or
under poor wireless channel conditions (longer upload time). Our new FL
protocol, which we refer to as FedCS, mitigates this problem and performs FL
efficiently while actively managing clients based on their resource conditions.
Specifically, FedCS solves a client selection problem with resource
constraints, which allows the server to aggregate as many client updates as
possible and to accelerate performance improvement in ML models. We conducted
an experimental evaluation using publicly-available large-scale image datasets
to train deep neural networks on MEC environment simulations. The experimental
results show that FedCS is able to complete its training process in a
significantly shorter time compared to the original FL protocol
Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models
Conventional deep neural networks (DNN) for speech acoustic modeling rely on
Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary
class labels as the targets for DNN training. Subword classes in speech
recognition systems correspond to context-dependent tied states or senones. The
present work addresses some limitations of GMM-HMM senone alignments for DNN
training. We hypothesize that the senone probabilities obtained from a DNN
trained with binary labels can provide more accurate targets to learn better
acoustic models. However, DNN outputs bear inaccuracies which are exhibited as
high dimensional unstructured noise, whereas the informative components are
structured and low-dimensional. We exploit principle component analysis (PCA)
and sparse coding to characterize the senone subspaces. Enhanced probabilities
obtained from low-rank and sparse reconstructions are used as soft-targets for
DNN acoustic modeling, that also enables training with untranscribed data.
Experiments conducted on AMI corpus shows 4.6% relative reduction in word error
rate
Recommended from our members
A Systematic Review of The Potential Use of Neurofeedback in Patients with Schizophrenia.
Schizophrenia (SCZ) is a neurodevelopmental disorder characterized by positive symptoms (hallucinations and delusions), negative symptoms (anhedonia, social withdrawal) and marked cognitive deficits (memory, executive function, and attention). Current mainstays of treatment, including medications and psychotherapy, do not adequately address cognitive symptoms, which are essential for everyday functioning. However, recent advances in computational neurobiology have rekindled interest in neurofeedback (NF), a form of self-regulation or neuromodulation, in potentially alleviating cognitive symptoms in patients with SCZ. Therefore, we conducted a systematic review of the literature for NF studies in SCZ to identify lessons learned and to identify steps to move the field forward. Our findings reveal that NF studies to date consist mostly of case studies and small sample, single-group studies. Despite few randomized clinical trials, the results suggest that NF is feasible and that it leads to measurable changes in brain function. These findings indicate early proof-of-concept data that needs to be followed up by larger, randomized clinical trials, testing the efficacy of NF compared to well thought out placebos. We hope that such an undertaking by the field will lead to innovative solutions that address refractory symptoms and improve everyday functioning in patients with SCZ
- …