44 research outputs found
Planning Based System for Child-Robot Interaction in Dynamic Play Environments
This paper describes the initial steps towards the design of a robotic system
that intends to perform actions autonomously in a naturalistic play
environment. At the same time it aims for social human-robot interaction~(HRI),
focusing on children. We draw on existing theories of child development and on
dimensional models of emotions to explore the design of a dynamic interaction
framework for natural child-robot interaction. In this dynamic setting, the
social HRI is defined by the ability of the system to take into consideration
the socio-emotional state of the user and to plan appropriately by selecting
appropriate strategies for execution. The robot needs a temporal planning
system, which combines features of task-oriented actions and principles of
social human robot interaction. We present initial results of an empirical
study for the evaluation of the proposed framework in the context of a
collaborative sorting game
Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning
One of the challenges in Speech Emotion Recognition (SER) "in the wild" is
the large mismatch between training and test data (e.g. speakers and tasks). In
order to improve the generalisation capabilities of the emotion models, we
propose to use Multi-Task Learning (MTL) and use gender and naturalness as
auxiliary tasks in deep neural networks. This method was evaluated in
within-corpus and various cross-corpus classification experiments that simulate
conditions "in the wild". In comparison to Single-Task Learning (STL) based
state of the art methods, we found that our MTL method proposed improved
performance significantly. Particularly, models using both gender and
naturalness achieved more gains than those using either gender or naturalness
separately. This benefit was also found in the high-level representations of
the feature space, obtained from our method proposed, where discriminative
emotional clusters could be observed.Comment: Published in the proceedings of INTERSPEECH, Stockholm, September,
201
Learning spectro-temporal features with 3D CNNs for speech emotion recognition
In this paper, we propose to use deep 3-dimensional convolutional networks
(3D CNNs) in order to address the challenge of modelling spectro-temporal
dynamics for speech emotion recognition (SER). Compared to a hybrid of
Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our
proposed 3D CNNs simultaneously extract short-term and long-term spectral
features with a moderate number of parameters. We evaluated our proposed and
other state-of-the-art methods in a speaker-independent manner using aggregated
corpora that give a large and diverse set of speakers. We found that 1) shallow
temporal and moderately deep spectral kernels of a homogeneous architecture are
optimal for the task; and 2) our 3D CNNs are more effective for
spectro-temporal feature learning compared to other methods. Finally, we
visualised the feature space obtained with our proposed method using
t-distributed stochastic neighbour embedding (T-SNE) and could observe distinct
clusters of emotions.Comment: ACII, 2017, San Antoni
Dirichlet process approach for radio-based simultaneous localization and mapping
Due to 5G millimeter wave (mmWave), spatial channel parameters are becoming
highly resolvable, enabling accurate vehicle localization and mapping. We
propose a novel method of radio simultaneous localization and mapping (SLAM)
with the Dirichlet process (DP). The DP, which can estimate the number of
clusters as well as clustering, is capable of identifying the locations of
reflectors by classifying signals when such 5G signals are reflected and
received from various objects. We generate birth points using the measurements
from 5G mmWave signals received by the vehicle and classify objects by
clustering birth points generated over time. Each time we use the DP clustering
method, we can map landmarks in the environment in challenging situations where
false alarms exist in the measurements and change the cardinality of received
signals. Simulation results demonstrate the performance of the proposed scheme.
By comparing the results with the SLAM based on the Rao-Blackwellized
probability hypothesis density filter, we confirm a slight drop in SLAM
performance, but as a result, we validate that it has a significant gain in
computational complexity
Robot response behaviors to accommodate hearing problems
One requirement that arises for a social (semi-autonomous telepresence) robot aimed at conversations with the elderly, is to accommodate hearing problems. In this paper we compare two approaches to this requirement; (1) moving closer, mimicking the leaning behavior commonly observed in elderly with hearing problems, (2) turning up the volume, which is a more mechanical solution. Our findings with elderly participants show that they preferred the turning up of the volume, since they rated it significantly higher
Learning spectral-temporal features with 3D CNNs for speech emotion recognition
In this paper, we propose to use deep 3-dimensional convolutional networks (3D CNNs) in order to address the challenge of modelling spectro-temporal dynamics for speech emotion recognition (SER). Compared to a hybrid of Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our proposed 3D CNNs simultaneously extract short-term and long-term spectral features with a moderate number of parameters. We evaluated our proposed and other state-of-the-art methods in a speaker-independent manner using aggregated corpora that give a large and diverse set of speakers. We found that 1) shallow temporal and moderately deep spectral kernels of a homogeneous architecture are optimal for the task; and 2) our 3D CNNs are more effective for spectro-temporal feature learning compared to other methods. Finally, we visualised the feature space obtained with our proposed method using t-distributed stochastic neighbour embedding (T-SNE) and could observe distinct clusters of emotions
Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning
One of the challenges in Speech Emotion Recognition (SER) "in the wild" is the large mismatch between training and test data (e.g. speakers and tasks). In order to improve the generalisation capabilities of the emotion models, we propose to use Multi-Task Learning (MTL) and use gender and naturalness as auxiliary tasks in deep neural networks. This method was evaluated in within-corpus and various cross-corpus classification experiments that simulate conditions "in the wild". In comparison to Single-Task Learning (STL) based state of the art methods, we found that our MTL method proposed improved performance significantly. Particularly, models using both gender and naturalness achieved more gains than those using either gender or naturalness separately. This benefit was also found in the high-level representations of the feature space, obtained from our method proposed, where discriminative emotional clusters could be observed
Targeting histone deacetylases to modulate graft-versus-host disease and graft-versus-leukemia
Allogeneic hematopoietic stem cell transplantation (allo-HSCT) is the main therapeutic strategy for patients with both malignant and nonmalignant disorders. The therapeutic benefits of allo-HSCT in malignant disorders are primarily derived from the graft-versus-leukemia (GvL) effect, in which T cells in the donor graft recognize and eradicate residual malignant cells. However, the same donor T cells can also recognize normal host tissues as foreign, leading to the development of graft-versus-host disease (GvHD), which is difficult to separate from GvL and is the most frequent and serious complication following allo-HSCT. Inhibition of donor T cell toxicity helps in reducing GvHD but also restricts GvL activity. Therefore, developing a novel therapeutic strategy that selectively suppresses GvHD without affecting GvL is essential. Recent studies have shown that inhibition of histone deacetylases (HDACs) not only inhibits the growth of tumor cells but also regulates the cytotoxic activity of T cells. Here, we compile the known therapeutic potential of HDAC inhibitors in preventing several stages of GvHD pathogenesis. Furthermore, we will also review the current clinical features of HDAC inhibitors in preventing and treating GvHD as well as maintaining GvL
Cooperative mmWave PHD-SLAM with Moving Scatterers
Using the multiple-model~(MM) probability hypothesis density~(PHD) filter,
millimeter wave~(mmWave) radio simultaneous localization and mapping~(SLAM) in
vehicular scenarios is susceptible to movements of objects, in particular
vehicles driving in parallel with the ego vehicle. We propose and evaluate two
countermeasures to track vehicle scatterers~(VSs) in mmWave radio MM-PHD-SLAM.
First, locally at each vehicle, we generate and treat the VS map PHD in the
context of Bayesian recursion, and modify vehicle state correction with the VS
map PHD. Second, in the global map fusion process at the base station, we
average the VS map PHD and upload it with self-vehicle posterior density,
compute fusion weights, and prune the target with low Gaussian weight in the
context of arithmetic average-based map fusion. From simulation results, the
proposed cooperative mmWave radio MM-PHD-SLAM filter is shown to outperform the
previous filter in VS scenarios
Automatic analysis of children’s engagement using interactional network features
We explored the automatic analysis of vocal non-verbal cues of a group of children in the context of engagement and collaborative play. For the current study, we defined two types of engagement on groups of children: harmonised and unharmonised. A spontaneous audiovisual corpus with groups of children who collaboratively build a 3D puzzle was collected. With this corpus, we modelled the interactions among children using network-based features representing the centrality and similarity of interactions. The centrality measures how interactions among group members are concentrated on a specific speaker while the similarity measures how similar the interactions are. We examined their discriminative characteristics in harmonised and unharmonised engagement situations. High centrality and low similarity values were found in unharmonised engagement situations. In harmonised engagement situations, we found low centrality and high similarity values. These results suggest that interactional network features are promising for the development of automatic detection of engagement at the group level