74,681 research outputs found
A dynamic texture based approach to recognition of facial actions and their temporal models
In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set
Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification
Dynamic Textures (DTs) are sequences of images of moving scenes that exhibit
certain stationarity properties in time such as smoke, vegetation and fire. The
analysis of DT is important for recognition, segmentation, synthesis or
retrieval for a range of applications including surveillance, medical imaging
and remote sensing. Deep learning methods have shown impressive results and are
now the new state of the art for a wide range of computer vision tasks
including image and video recognition and segmentation. In particular,
Convolutional Neural Networks (CNNs) have recently proven to be well suited for
texture analysis with a design similar to a filter bank approach. In this
paper, we develop a new approach to DT analysis based on a CNN method applied
on three orthogonal planes x y , xt and y t . We train CNNs on spatial frames
and temporal slices extracted from the DT sequences and combine their outputs
to obtain a competitive DT classifier. Our results on a wide range of commonly
used DT classification benchmark datasets prove the robustness of our approach.
Significant improvement of the state of the art is shown on the larger
datasets.Comment: 19 pages, 10 figure
Investigation of Different Skeleton Features for CNN-based 3D Action Recognition
Deep learning techniques are being used in skeleton based action recognition
tasks and outstanding performance has been reported. Compared with RNN based
methods which tend to overemphasize temporal information, CNN-based approaches
can jointly capture spatio-temporal information from texture color images
encoded from skeleton sequences. There are several skeleton-based features that
have proven effective in RNN-based and handcrafted-feature-based methods.
However, it remains unknown whether they are suitable for CNN-based approaches.
This paper proposes to encode five spatial skeleton features into images with
different encoding methods. In addition, the performance implication of
different joints used for feature extraction is studied. The proposed method
achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action
analysis. An accuracy of 75.32\% was achieved in Large Scale 3D Human Activity
Analysis Challenge in Depth Videos
Recommended from our members
Automatic affective dimension recognition from naturalistic facial expressions based on wavelet filtering and PLS regression
Automatic affective dimension recognition from facial expression continuously in naturalistic contexts is a very challenging research topic but very important in human-computer interaction. In this paper, an automatic recognition system was proposed to predict the affective dimensions such as Arousal, Valence and Dominance continuously in naturalistic facial expression videos. Firstly, visual and vocal features are extracted from image frames and audio segments in facial expression videos. Secondly, a wavelet transform based digital filtering method is applied to remove the irrelevant noise information in the feature space. Thirdly, Partial Least Squares regression is used to predict the affective dimensions from both video and audio modalities. Finally, two modalities are combined to boost overall performance in the decision fusion process. The proposed method is tested in the fourth international Audio/Visual Emotion Recognition Challenge (AVEC2014) dataset and compared to other state-of-the-art methods in the affect recognition sub-challenge with a good performance
Machine Analysis of Facial Expressions
No abstract
- …