4,948 research outputs found
Two-Stream Convolutional Networks for Action Recognition in Videos
We investigate architectures of discriminatively trained deep Convolutional
Networks (ConvNets) for action recognition in video. The challenge is to
capture the complementary information on appearance from still frames and
motion between frames. We also aim to generalise the best performing
hand-crafted features within a data-driven learning framework.
Our contribution is three-fold. First, we propose a two-stream ConvNet
architecture which incorporates spatial and temporal networks. Second, we
demonstrate that a ConvNet trained on multi-frame dense optical flow is able to
achieve very good performance in spite of limited training data. Finally, we
show that multi-task learning, applied to two different action classification
datasets, can be used to increase the amount of training data and improve the
performance on both.
Our architecture is trained and evaluated on the standard video actions
benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of
the art. It also exceeds by a large margin previous attempts to use deep nets
for video classification
- …