8 research outputs found
Temporal Bilinear Networks for Video Action Recognition
Temporal modeling in videos is a fundamental yet challenging problem in
computer vision. In this paper, we propose a novel Temporal Bilinear (TB) model
to capture the temporal pairwise feature interactions between adjacent frames.
Compared with some existing temporal methods which are limited in linear
transformations, our TB model considers explicit quadratic bilinear
transformations in the temporal domain for motion evolution and sequential
relation modeling. We further leverage the factorized bilinear model in linear
complexity and a bottleneck network design to build our TB blocks, which also
constrains the parameters and computation cost. We consider two schemes in
terms of the incorporation of TB blocks and the original 2D spatial
convolutions, namely wide and deep Temporal Bilinear Networks (TBN). Finally,
we perform experiments on several widely adopted datasets including Kinetics,
UCF101 and HMDB51. The effectiveness of our TBNs is validated by comprehensive
ablation analyses and comparisons with various state-of-the-art methods.Comment: Accepted by AAAI 201