4,551 research outputs found
Saliency-guided video classification via adaptively weighted learning
Video classification is productive in many practical applications, and the
recent deep learning has greatly improved its accuracy. However, existing works
often model video frames indiscriminately, but from the view of motion, video
frames can be decomposed into salient and non-salient areas naturally. Salient
and non-salient areas should be modeled with different networks, for the former
present both appearance and motion information, and the latter present static
background information. To address this problem, in this paper, video saliency
is predicted by optical flow without supervision firstly. Then two streams of
3D CNN are trained individually for raw frames and optical flow on salient
areas, and another 2D CNN is trained for raw frames on non-salient areas. For
the reason that these three streams play different roles for each class, the
weights of each stream are adaptively learned for each class. Experimental
results show that saliency-guided modeling and adaptively weighted learning can
reinforce each other, and we achieve the state-of-the-art results.Comment: 6 pages, 1 figure, accepted by ICME 201
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
LEARNet Dynamic Imaging Network for Micro Expression Recognition
Unlike prevalent facial expressions, micro expressions have subtle,
involuntary muscle movements which are short-lived in nature. These minute
muscle movements reflect true emotions of a person. Due to the short duration
and low intensity, these micro-expressions are very difficult to perceive and
interpret correctly. In this paper, we propose the dynamic representation of
micro-expressions to preserve facial movement information of a video in a
single frame. We also propose a Lateral Accretive Hybrid Network (LEARNet) to
capture micro-level features of an expression in the facial region. The LEARNet
refines the salient expression features in accretive manner by incorporating
accretion layers (AL) in the network. The response of the AL holds the hybrid
feature maps generated by prior laterally connected convolution layers.
Moreover, LEARNet architecture incorporates the cross decoupled relationship
between convolution layers which helps in preserving the tiny but influential
facial muscle change information. The visual responses of the proposed LEARNet
depict the effectiveness of the system by preserving both high- and micro-level
edge features of facial expression. The effectiveness of the proposed LEARNet
is evaluated on four benchmark datasets: CASME-I, CASME-II, CAS(ME)^2 and SMIC.
The experimental results after investigation show a significant improvement of
4.03%, 1.90%, 1.79% and 2.82% as compared with ResNet on CASME-I, CASME-II,
CAS(ME)^2 and SMIC datasets respectively.Comment: Dynamic imaging, accretion, lateral, micro expression recognitio
- …