400 research outputs found
Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking
Public speaking is an important aspect of human communication and
interaction. The majority of computational work on public speaking concentrates
on analyzing the spoken content, and the verbal behavior of the speakers. While
the success of public speaking largely depends on the content of the talk, and
the verbal behavior, non-verbal (visual) cues, such as gestures and physical
appearance also play a significant role. This paper investigates the importance
of visual cues by estimating their contribution towards predicting the
popularity of a public lecture. For this purpose, we constructed a large
database of more than TED talk videos. As a measure of popularity of the
TED talks, we leverage the corresponding (online) viewers' ratings from
YouTube. Visual cues related to facial and physical appearance, facial
expressions, and pose variations are extracted from the video frames using
convolutional neural network (CNN) models. Thereafter, an attention-based long
short-term memory (LSTM) network is proposed to predict the video popularity
from the sequence of visual features. The proposed network achieves
state-of-the-art prediction accuracy indicating that visual cues alone contain
highly predictive information about the popularity of a talk. Furthermore, our
network learns a human-like attention mechanism, which is particularly useful
for interpretability, i.e. how attention varies with time, and across different
visual cues by indicating their relative importance
From Unimodal to Multimodal: improving the sEMG-Based Pattern Recognition via deep generative models
Multimodal hand gesture recognition (HGR) systems can achieve higher
recognition accuracy. However, acquiring multimodal gesture recognition data
typically requires users to wear additional sensors, thereby increasing
hardware costs. This paper proposes a novel generative approach to improve
Surface Electromyography (sEMG)-based HGR accuracy via virtual Inertial
Measurement Unit (IMU) signals. Specifically, we trained a deep generative
model based on the intrinsic correlation between forearm sEMG signals and
forearm IMU signals to generate virtual forearm IMU signals from the input
forearm sEMG signals at first. Subsequently, the sEMG signals and virtual IMU
signals were fed into a multimodal Convolutional Neural Network (CNN) model for
gesture recognition. To evaluate the performance of the proposed approach, we
conducted experiments on 6 databases, including 5 publicly available databases
and our collected database comprising 28 subjects performing 38 gestures,
containing both sEMG and IMU data. The results show that our proposed approach
outperforms the sEMG-based unimodal HGR method (with increases of
2.15%-13.10%). It demonstrates that incorporating virtual IMU signals,
generated by deep generative models, can significantly enhance the accuracy of
sEMG-based HGR. The proposed approach represents a successful attempt to
transition from unimodal HGR to multimodal HGR without additional sensor
hardware
Recognizing Handshapes using Small Datasets
Advances in convolutional neural networks have made possible significant improvements in the state-of-the-art in image classification.
However, their success on a particular field rests on the possibility of obtaining labeled data to train networks. Handshape recognition from images, an important subtask of both gesture and sign language recognition, suffers from such a lack of data. Furthermore, hands are highly deformable objects and therefore handshape classification models require larger datasets.
We analyze both state of the art models for image classification, as well as data augmentation schemes and specific models to tackle problems with small datasets. In particular, we perform experiments with Wide- DenseNet, a state of the art convolutional architecture and Prototypical Networks, a state of the art few-shot learning meta model. In both cases, we also quantify the impact of data augmentation on accuracy.
Our results show that on small and simple data sets such as CIARP, all models and variations of achieve perfect accuracy, and therefore the utility of the data is highly doubtful, despite its having 6000 samples.
On the other hand, in small but complex datasets such as LSA16 (800 samples), specialized methods such as Prototypical Networks do have an advantage over other methods. On RWTH, another complex and small dataset with close to 4000 samples, a traditional and state-of-the-art method such as Wide-DenseNet surpasses all other models. Also, data augmentation consistently increases accuracy for Wide-DenseNet, but not fo Prototypical Networks.XX Workshop de Agentes y Sistemas inteligentes.Red de Universidades con Carreras en Informátic
Recognizing Handshapes using Small Datasets
Advances in convolutional neural networks have made possible significant improvements in the state-of-the-art in image classification.
However, their success on a particular field rests on the possibility of obtaining labeled data to train networks. Handshape recognition from images, an important subtask of both gesture and sign language recognition, suffers from such a lack of data. Furthermore, hands are highly deformable objects and therefore handshape classification models require larger datasets.
We analyze both state of the art models for image classification, as well as data augmentation schemes and specific models to tackle problems with small datasets. In particular, we perform experiments with Wide- DenseNet, a state of the art convolutional architecture and Prototypical Networks, a state of the art few-shot learning meta model. In both cases, we also quantify the impact of data augmentation on accuracy.
Our results show that on small and simple data sets such as CIARP, all models and variations of achieve perfect accuracy, and therefore the utility of the data is highly doubtful, despite its having 6000 samples.
On the other hand, in small but complex datasets such as LSA16 (800 samples), specialized methods such as Prototypical Networks do have an advantage over other methods. On RWTH, another complex and small dataset with close to 4000 samples, a traditional and state-of-the-art method such as Wide-DenseNet surpasses all other models. Also, data augmentation consistently increases accuracy for Wide-DenseNet, but not fo Prototypical Networks.XX Workshop de Agentes y Sistemas inteligentes.Red de Universidades con Carreras en Informátic
Recognizing Handshapes using Small Datasets
Advances in convolutional neural networks have made possible significant improvements in the state-of-the-art in image classification.
However, their success on a particular field rests on the possibility of obtaining labeled data to train networks. Handshape recognition from images, an important subtask of both gesture and sign language recognition, suffers from such a lack of data. Furthermore, hands are highly deformable objects and therefore handshape classification models require larger datasets.
We analyze both state of the art models for image classification, as well as data augmentation schemes and specific models to tackle problems with small datasets. In particular, we perform experiments with Wide- DenseNet, a state of the art convolutional architecture and Prototypical Networks, a state of the art few-shot learning meta model. In both cases, we also quantify the impact of data augmentation on accuracy.
Our results show that on small and simple data sets such as CIARP, all models and variations of achieve perfect accuracy, and therefore the utility of the data is highly doubtful, despite its having 6000 samples.
On the other hand, in small but complex datasets such as LSA16 (800 samples), specialized methods such as Prototypical Networks do have an advantage over other methods. On RWTH, another complex and small dataset with close to 4000 samples, a traditional and state-of-the-art method such as Wide-DenseNet surpasses all other models. Also, data augmentation consistently increases accuracy for Wide-DenseNet, but not fo Prototypical Networks.XX Workshop de Agentes y Sistemas inteligentes.Red de Universidades con Carreras en Informátic
Myoelectric Control Systems for Hand Rehabilitation Device: A Review
One of the challenges of the hand rehabilitation device is to create a smooth interaction between the device and user. The smooth interaction can be achieved by considering myoelectric signal generated by human's muscle. Therefore, the so-called myoelectric control system (MCS) has been developed since the 1940s. Various MCS's has been proposed, developed, tested, and implemented in various hand rehabilitation devices for different purposes. This article presents a review of MCS in the existing hand rehabilitation devices. The MCS can be grouped into main groups, the non-pattern recognition and pattern recognition ones. In term of implementation, it can be classified as MCS for prosthetic and exoskeleton hand. Main challenges for MCS today is the robustness issue that hampers the implementation of MCS on the clinical application
A Review on Human-Computer Interaction and Intelligent Robots
In the field of artificial intelligence, human–computer interaction (HCI) technology and its related intelligent robot technologies are essential and interesting contents of research. From the perspective of software algorithm and hardware system, these above-mentioned technologies study and try to build a natural HCI environment. The purpose of this research is to provide an overview of HCI and intelligent robots. This research highlights the existing technologies of listening, speaking, reading, writing, and other senses, which are widely used in human interaction. Based on these same technologies, this research introduces some intelligent robot systems and platforms. This paper also forecasts some vital challenges of researching HCI and intelligent robots. The authors hope that this work will help researchers in the field to acquire the necessary information and technologies to further conduct more advanced research
- …