26,896 research outputs found
Complexity-Aware Assignment of Latent Values in Discriminative Models for Accurate Gesture Recognition
Many of the state-of-the-art algorithms for gesture recognition are based on
Conditional Random Fields (CRFs). Successful approaches, such as the
Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose
values are mapped to the values of the labels. In this paper we propose a novel
methodology to set the latent values according to the gesture complexity. We
use an heuristic that iterates through the samples associated with each label
value, stimating their complexity. We then use it to assign the latent values
to the label values. We evaluate our method on the task of recognizing human
gestures from video streams. The experiments were performed in binary datasets,
generated by grouping different labels. Our results demonstrate that our
approach outperforms the arbitrary one in many cases, increasing the accuracy
by up to 10%.Comment: Conference paper published at 2016 29th SIBGRAPI, Conference on
Graphics, Patterns and Images (SIBGRAPI). 8 pages, 7 figure
Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields
Automated Facial Expression Recognition (FER) has been a challenging task for
decades. Many of the existing works use hand-crafted features such as LBP, HOG,
LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as
Support Vector Machines for expression recognition. These methods often require
rigorous hyperparameter tuning to achieve good results. Recently Deep Neural
Networks (DNN) have shown to outperform traditional methods in visual object
recognition. In this paper, we propose a two-part network consisting of a
DNN-based architecture followed by a Conditional Random Field (CRF) module for
facial expression recognition in videos. The first part captures the spatial
relation within facial images using convolutional layers followed by three
Inception-ResNet modules and two fully-connected layers. To capture the
temporal relation between the image frames, we use linear chain CRF in the
second part of our network. We evaluate our proposed network on three publicly
available databases, viz. CK+, MMI, and FERA. Experiments are performed in
subject-independent and cross-database manners. Our experimental results show
that cascading the deep network architecture with the CRF module considerably
increases the recognition of facial expressions in videos and in particular it
outperforms the state-of-the-art methods in the cross-database experiments and
yields comparable results in the subject-independent experiments.Comment: To appear in 12th IEEE Conference on Automatic Face and Gesture
Recognition Worksho
- …