10,762 research outputs found
Deep HMResNet Model for Human Activity-Aware Robotic Systems
Endowing the robotic systems with cognitive capabilities for recognizing
daily activities of humans is an important challenge, which requires
sophisticated and novel approaches. Most of the proposed approaches explore
pattern recognition techniques which are generally based on hand-crafted
features or learned features. In this paper, a novel Hierarchal Multichannel
Deep Residual Network (HMResNet) model is proposed for robotic systems to
recognize daily human activities in the ambient environments. The introduced
model is comprised of multilevel fusion layers. The proposed Multichannel 1D
Deep Residual Network model is, at the features level, combined with a
Bottleneck MLP neural network to automatically extract robust features
regardless of the hardware configuration and, at the decision level, is fully
connected with an MLP neural network to recognize daily human activities.
Empirical experiments on real-world datasets and an online demonstration are
used for validating the proposed model. Results demonstrated that the proposed
model outperforms the baseline models in daily human activity recognition.Comment: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606
Neural Pattern Recognition on Multichannel Input Representation
This article presents a new neural pattern recognition architecture on multichannel data representation. The architecture emploies generalized ART modules as building blocks to construct a supervised learning system generating recognition codes on channels dynamically selected in context using serial and parallel match trackings led by inter-ART vigilance signals.Sharp Corporation, Information Techology Research Laboratories, Nara, Japa
Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking
Public speaking is an important aspect of human communication and
interaction. The majority of computational work on public speaking concentrates
on analyzing the spoken content, and the verbal behavior of the speakers. While
the success of public speaking largely depends on the content of the talk, and
the verbal behavior, non-verbal (visual) cues, such as gestures and physical
appearance also play a significant role. This paper investigates the importance
of visual cues by estimating their contribution towards predicting the
popularity of a public lecture. For this purpose, we constructed a large
database of more than TED talk videos. As a measure of popularity of the
TED talks, we leverage the corresponding (online) viewers' ratings from
YouTube. Visual cues related to facial and physical appearance, facial
expressions, and pose variations are extracted from the video frames using
convolutional neural network (CNN) models. Thereafter, an attention-based long
short-term memory (LSTM) network is proposed to predict the video popularity
from the sequence of visual features. The proposed network achieves
state-of-the-art prediction accuracy indicating that visual cues alone contain
highly predictive information about the popularity of a talk. Furthermore, our
network learns a human-like attention mechanism, which is particularly useful
for interpretability, i.e. how attention varies with time, and across different
visual cues by indicating their relative importance
- …