3,948 research outputs found
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
Objective Classes for Micro-Facial Expression Recognition
Micro-expressions are brief spontaneous facial expressions that appear on a
face when a person conceals an emotion, making them different to normal facial
expressions in subtlety and duration. Currently, emotion classes within the
CASME II dataset are based on Action Units and self-reports, creating conflicts
during machine learning training. We will show that classifying expressions
using Action Units, instead of predicted emotion, removes the potential bias of
human reporting. The proposed classes are tested using LBP-TOP, HOOF and HOG 3D
feature descriptors. The experiments are evaluated on two benchmark FACS coded
datasets: CASME II and SAMM. The best result achieves 86.35\% accuracy when
classifying the proposed 5 classes on CASME II using HOG 3D, outperforming the
result of the state-of-the-art 5-class emotional-based classification in CASME
II. Results indicate that classification based on Action Units provides an
objective method to improve micro-expression recognition.Comment: 11 pages, 4 figures and 5 tables. This paper will be submitted for
journal revie
Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation
Deep neural networks with alternating convolutional, max-pooling and
decimation layers are widely used in state of the art architectures for
computer vision. Max-pooling purposefully discards precise spatial information
in order to create features that are more robust, and typically organized as
lower resolution spatial feature maps. On some tasks, such as whole-image
classification, max-pooling derived features are well suited; however, for
tasks requiring precise localization, such as pixel level prediction and
segmentation, max-pooling destroys exactly the information required to perform
well. Precise localization may be preserved by shallow convnets without pooling
but at the expense of robustness. Can we have our max-pooled multi-layered cake
and eat it too? Several papers have proposed summation and concatenation based
methods for combining upsampled coarse, abstract features with finer features
to produce robust pixel level predictions. Here we introduce another model ---
dubbed Recombinator Networks --- where coarse features inform finer features
early in their formation such that finer features can make use of several
layers of computation in deciding how to use coarse features. The model is
trained once, end-to-end and performs better than summation-based
architectures, reducing the error from the previous state of the art on two
facial keypoint datasets, AFW and AFLW, by 30\% and beating the current
state-of-the-art on 300W without using extra data. We improve performance even
further by adding a denoising prediction model based on a novel convnet
formulation.Comment: accepted in CVPR 201
Recommended from our members
Efficient smile detection by Extreme Learning Machine
Smile detection is a specialized task in facial expression analysis with applications such as photo selection, user experience analysis, and patient monitoring. As one of the most important and informative expressions, smile conveys the underlying emotion status such as joy, happiness, and satisfaction. In this paper, an efficient smile detection approach is proposed based on Extreme Learning Machine (ELM). The faces are first detected and a holistic flow-based face registration is applied which does not need any manual labeling or key point detection. Then ELM is used to train the classifier. The proposed smile detector is tested with different feature descriptors on publicly available databases including real-world face images. The comparisons against benchmark classifiers including Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA) suggest that the proposed ELM based smile detector in general performs better and is very efficient. Compared to state-of-the-art smile detector, the proposed method achieves competitive results without preprocessing and manual registration
- …