2,557 research outputs found
FusionSense: Emotion Classification using Feature Fusion of Multimodal Data and Deep learning in a Brain-inspired Spiking Neural Network
Using multimodal signals to solve the problem of emotion recognition is one of the emerging trends in affective computing. Several studies have utilized state of the art deep learning methods and combined physiological signals, such as the electrocardiogram (EEG), electroencephalogram (ECG), skin temperature, along with facial expressions, voice, posture to name a few, in order to classify emotions. Spiking neural networks (SNNs) represent the third generation of neural networks and employ biologically plausible models of neurons. SNNs have been shown to handle Spatio-temporal data, which is essentially the nature of the data encountered in emotion recognition problem, in an efficient manner. In this work, for the first time, we propose the application of SNNs in order to solve the emotion recognition problem with the multimodal dataset. Specifically, we use the NeuCube framework, which employs an evolving SNN architecture to classify emotional valence and evaluate the performance of our approach on the MAHNOB-HCI dataset. The multimodal data used in our work consists of facial expressions along with physiological signals such as ECG, skin temperature, skin conductance, respiration signal, mouth length, and pupil size. We perform classification under the Leave-One-Subject-Out (LOSO) cross-validation mode. Our results show that the proposed approach achieves an accuracy of 73.15% for classifying binary valence when applying feature-level fusion, which is comparable to other deep learning methods. We achieve this accuracy even without using EEG, which other deep learning methods have relied on to achieve this level of accuracy. In conclusion, we have demonstrated that the SNN can be successfully used for solving the emotion recognition problem with multimodal data and also provide directions for future research utilizing SNN for Affective computing. In addition to the good accuracy, the SNN recognition system is requires incrementally trainable on new data in an adaptive way. It only one pass training, which makes it suitable for practical and on-line applications. These features are not manifested in other methods for this problem.Peer reviewe
The Profiling Potential of Computer Vision and the Challenge of Computational Empiricism
Computer vision and other biometrics data science applications have commenced
a new project of profiling people. Rather than using 'transaction generated
information', these systems measure the 'real world' and produce an assessment
of the 'world state' - in this case an assessment of some individual trait.
Instead of using proxies or scores to evaluate people, they increasingly deploy
a logic of revealing the truth about reality and the people within it. While
these profiling knowledge claims are sometimes tentative, they increasingly
suggest that only through computation can these excesses of reality be captured
and understood. This article explores the bases of those claims in the systems
of measurement, representation, and classification deployed in computer vision.
It asks if there is something new in this type of knowledge claim, sketches an
account of a new form of computational empiricism being operationalised, and
questions what kind of human subject is being constructed by these
technological systems and practices. Finally, the article explores legal
mechanisms for contesting the emergence of computational empiricism as the
dominant knowledge platform for understanding the world and the people within
it
Multi-type Disentanglement without Adversarial Training
Controlling the style of natural language by disentangling the latent space
is an important step towards interpretable machine learning. After the latent
space is disentangled, the style of a sentence can be transformed by tuning the
style representation without affecting other features of the sentence. Previous
works usually use adversarial training to guarantee that disentangled vectors
do not affect each other. However, adversarial methods are difficult to train.
Especially when there are multiple features (e.g., sentiment, or tense, which
we call style types in this paper), each feature requires a separate
discriminator for extracting a disentangled style vector corresponding to that
feature. In this paper, we propose a unified distribution-controlling method,
which provides each specific style value (the value of style types, e.g.,
positive sentiment, or past tense) with a unique representation. This method
contributes a solid theoretical basis to avoid adversarial training in
multi-type disentanglement. We also propose multiple loss functions to achieve
a style-content disentanglement as well as a disentanglement among multiple
style types. In addition, we observe that if two different style types always
have some specific style values that occur together in the dataset, they will
affect each other when transferring the style values. We call this phenomenon
training bias, and we propose a loss function to alleviate such training bias
while disentangling multiple types. We conduct experiments on two datasets
(Yelp service reviews and Amazon product reviews) to evaluate the
style-disentangling effect and the unsupervised style transfer performance on
two style types: sentiment and tense. The experimental results show the
effectiveness of our model
- …