Search CORE

311,050 research outputs found

A dynamic framework based on local Zernike Moment and motion history image for facial expression recognition

Author: Fan Xijian
Tjahjadi Tardi
Publication venue: 'Elsevier BV'
Publication date: 01/04/2017
Field of study

A dynamic descriptor facilitates robust recognition of facial expressions in video sequences. The current two main approaches to the recognition are basic emotion recognition and recognition based on facial action coding system (FACS) action units. In this paper we focus on basic emotion recognition and propose a spatio-temporal feature based on local Zernike moment in the spatial domain using motion change frequency. We also design a dynamic feature comprising motion history image and entropy. To recognise a facial expression, a weighting strategy based on the latter feature and sub-division of the image frame is applied to the former to enhance the dynamic information of facial expression, and followed by the application of the classical support vector machine. Experiments on the CK+ and MMI datasets using leave-one-out cross validation scheme demonstrate that the integrated framework achieves a better performance than using individual descriptor separately. Compared with six state-of-arts methods, the proposed framework demonstrates a superior performance

Warwick Research Archives Portal Repository

Spatiotemporal Augmentation on Selective Frequencies for Video Representation Learning

Author: Han Dongyoon
Kim Jinhyung
Kim Junmo
Kim Taeoh
Shim Minho
Wee Dongyoon
Publication venue
Publication date: 08/04/2022
Field of study

Recent self-supervised video representation learning methods focus on maximizing the similarity between multiple augmented views from the same video and largely rely on the quality of generated views. In this paper, we propose frequency augmentation (FreqAug), a spatio-temporal data augmentation method in the frequency domain for video representation learning. FreqAug stochastically removes undesirable information from the video by filtering out specific frequency components so that learned representation captures essential features of the video for various downstream tasks. Specifically, FreqAug pushes the model to focus more on dynamic features rather than static features in the video via dropping spatial or temporal low-frequency components. In other words, learning invariance between remaining frequency components results in high-frequency enhanced representation with less static bias. To verify the generality of the proposed method, we experiment with FreqAug on multiple self-supervised learning frameworks along with standard augmentations. Transferring the improved representation to five video action recognition and two temporal action localization downstream tasks shows consistent improvements over baselines

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications