13,366 research outputs found
Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition
In this paper we address the problem of human action recognition from video
sequences. Inspired by the exemplary results obtained via automatic feature
learning and deep learning approaches in computer vision, we focus our
attention towards learning salient spatial features via a convolutional neural
network (CNN) and then map their temporal relationship with the aid of
Long-Short-Term-Memory (LSTM) networks. Our contribution in this paper is a
deep fusion framework that more effectively exploits spatial features from CNNs
with temporal features from LSTM models. We also extensively evaluate their
strengths and weaknesses. We find that by combining both the sets of features,
the fully connected features effectively act as an attention mechanism to
direct the LSTM to interesting parts of the convolutional feature sequence. The
significance of our fusion method is its simplicity and effectiveness compared
to other state-of-the-art methods. The evaluation results demonstrate that this
hierarchical multi stream fusion method has higher performance compared to
single stream mapping methods allowing it to achieve high accuracy
outperforming current state-of-the-art methods in three widely used databases:
UCF11, UCFSports, jHMDB.Comment: Published as a conference paper at WACV 201
- …