9,197 research outputs found
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Talking With Your Hands: Scaling Hand Gestures and Recognition With CNNs
The use of hand gestures provides a natural alternative to cumbersome
interface devices for Human-Computer Interaction (HCI) systems. As the
technology advances and communication between humans and machines becomes more
complex, HCI systems should also be scaled accordingly in order to accommodate
the introduced complexities. In this paper, we propose a methodology to scale
hand gestures by forming them with predefined gesture-phonemes, and a
convolutional neural network (CNN) based framework to recognize hand gestures
by learning only their constituents of gesture-phonemes. The total number of
possible hand gestures can be increased exponentially by increasing the number
of used gesture-phonemes. For this objective, we introduce a new benchmark
dataset named Scaled Hand Gestures Dataset (SHGD) with only gesture-phonemes in
its training set and 3-tuples gestures in the test set. In our experimental
analysis, we achieve to recognize hand gestures containing one and three
gesture-phonemes with an accuracy of 98.47% (in 15 classes) and 94.69% (in 810
classes), respectively. Our dataset, code and pretrained models are publicly
available.Comment: Accepted to ICCV 2019 workshop - Observing and Understanding Hands in
Action (HANDS 2019
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
A Dataset of Naturally Occurring, Whole-Body Background Activity to Reduce Gesture Conflicts
In real settings, natural body movements can be erroneously recognized by
whole-body input systems as explicit input actions. We call body activity not
intended as input actions "background activity." We argue that understanding
background activity is crucial to the success of always-available whole-body
input in the real world. To operationalize this argument, we contribute a
reusable study methodology and software tools to generate standardized
background activity datasets composed of data from multiple Kinect cameras, a
Vicon tracker, and two high-definition video cameras. Using our methodology, we
create an example background activity dataset for a television-oriented living
room setting. We use this dataset to demonstrate how it can be used to redesign
a gestural interaction vocabulary to minimize conflicts with the real world.
The software tools and initial living room dataset are publicly available
(http://www.dgp.toronto.edu/~dustin/backgroundactivity/)
Intelligent Approaches to interact with Machines using Hand Gesture Recognition in Natural way: A Survey
Hand gestures recognition (HGR) is one of the main areas of research for the
engineers, scientists and bioinformatics. HGR is the natural way of Human
Machine interaction and today many researchers in the academia and industry are
working on different application to make interactions more easy, natural and
convenient without wearing any extra device. HGR can be applied from games
control to vision enabled robot control, from virtual reality to smart home
systems. In this paper we are discussing work done in the area of hand gesture
recognition where focus is on the intelligent approaches including soft
computing based methods like artificial neural network, fuzzy logic, genetic
algorithms etc. The methods in the preprocessing of image for segmentation and
hand image construction also taken into study. Most researchers used fingertips
for hand detection in appearance based modeling. Finally the comparison of
results given by different researchers is also presented
A Face-to-Face Neural Conversation Model
Neural networks have recently become good at engaging in dialog. However,
current approaches are based solely on verbal text, lacking the richness of a
real face-to-face conversation. We propose a neural conversation model that
aims to read and generate facial gestures alongside with text. This allows our
model to adapt its response based on the "mood" of the conversation. In
particular, we introduce an RNN encoder-decoder that exploits the movement of
facial muscles, as well as the verbal conversation. The decoder consists of two
layers, where the lower layer aims at generating the verbal response and coarse
facial expressions, while the second layer fills in the subtle gestures, making
the generated output more smooth and natural. We train our neural network by
having it "watch" 250 movies. We showcase our joint face-text model in
generating more natural conversations through automatic metrics and a human
study. We demonstrate an example application with a face-to-face chatting
avatar.Comment: Published at CVPR 201
Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network
This paper proposes a novel 4D Facial Expression Recognition (FER) method
using Collaborative Cross-domain Dynamic Image Network (CCDN). Given a 4D data
of face scans, we first compute its geometrical images, and then combine their
correlated information in the proposed cross-domain image representations. The
acquired set is then used to generate cross-domain dynamic images (CDI) via
rank pooling that encapsulates facial deformations over time in terms of a
single image. For the training phase, these CDIs are fed into an end-to-end
deep learning model, and the resultant predictions collaborate over multi-views
for performance gain in expression classification. Furthermore, we propose a 4D
augmentation scheme that not only expands the training data scale but also
introduces significant facial muscle movement patterns to improve the FER
performance. Results from extensive experiments on the commonly used BU-4DFE
dataset under widely adopted settings show that our proposed method outperforms
the state-of-the-art 4D FER methods by achieving an accuracy of 96.5%
indicating its effectiveness.Comment: Published in the 30th British Machine Vision Conference (BMVC) 201
Deep video gesture recognition using illumination invariants
In this paper we present architectures based on deep neural nets for gesture
recognition in videos, which are invariant to local scaling. We amalgamate
autoencoder and predictor architectures using an adaptive weighting scheme
coping with a reduced size labeled dataset, while enriching our models from
enormous unlabeled sets. We further improve robustness to lighting conditions
by introducing a new adaptive filer based on temporal local scale
normalization. We provide superior results over known methods, including recent
reported approaches based on neural nets
Principal motion components for gesture recognition using a single-example
This paper introduces principal motion components (PMC), a new method for
one-shot gesture recognition. In the considered scenario a single
training-video is available for each gesture to be recognized, which limits the
application of traditional techniques (e.g., HMMs). In PMC, a 2D map of motion
energy is obtained per each pair of consecutive frames in a video. Motion maps
associated to a video are processed to obtain a PCA model, which is used for
recognition under a reconstruction-error approach. The main benefits of the
proposed approach are its simplicity, easiness of implementation, competitive
performance and efficiency. We report experimental results in one-shot gesture
recognition using the ChaLearn Gesture Dataset; a benchmark comprising more
than 50,000 gestures, recorded as both RGB and depth video with a Kinect
camera. Results obtained with PMC are competitive with alternative methods
proposed for the same data set
UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition
Current UAV-recorded datasets are mostly limited to action recognition and
object tracking, whereas the gesture signals datasets were mostly recorded in
indoor spaces. Currently, there is no outdoor recorded public video dataset for
UAV commanding signals. Gesture signals can be effectively used with UAVs by
leveraging the UAVs visual sensors and operational simplicity. To fill this gap
and enable research in wider application areas, we present a UAV gesture
signals dataset recorded in an outdoor setting. We selected 13 gestures
suitable for basic UAV navigation and command from general aircraft handling
and helicopter handling signals. We provide 119 high-definition video clips
consisting of 37151 frames. The overall baseline gesture recognition
performance computed using Pose-based Convolutional Neural Network (P-CNN) is
91.9 %. All the frames are annotated with body joints and gesture classes in
order to extend the dataset's applicability to a wider research area including
gesture recognition, action recognition, human pose recognition and situation
awareness.Comment: 12 pages, 4 figures, UAVision workshop, ECCV, 201
- …