1 research outputs found
Modelling Temporal Information Using Discrete Fourier Transform for Video Classification
Recently, video classification attracts intensive research efforts. However,
most existing works are based on framelevel visual features, which might fail
to model the temporal information, e.g. characteristics accumulated along time.
In order to capture video temporal information, we propose to analyse features
in frequency domain transformed by discrete Fourier transform (DFT features).
Frame-level features are firstly extract by a pre-trained deep convolutional
neural network (CNN). Then, time domain features are transformed and
interpolated into DFT features. CNN and DFT features are further encoded by
using different pooling methods and fused for video classification. In this
way, static image features extracted from a pre-trained deep CNN and temporal
information represented by DFT features are jointly considered for video
classification. We test our method for video emotion classification and action
recognition. Experimental results demonstrate that combining DFT features can
effectively capture temporal information and therefore improve the performance
of both video emotion classification and action recognition. Our approach has
achieved a state-of-the-art performance on the largest video emotion dataset
(VideoEmotion-8 dataset) and competitive results on UCF-101.Comment: to be revise