21,295 research outputs found
Compressed Video Action Recognition
Training robust deep video representations has proven to be much more
challenging than learning deep image representations. This is in part due to
the enormous size of raw video streams and the high temporal redundancy; the
true and interesting signal is often drowned in too much irrelevant data.
Motivated by that the superfluous information can be reduced by up to two
orders of magnitude by video compression (using H.264, HEVC, etc.), we propose
to train a deep network directly on the compressed video.
This representation has a higher information density, and we found the
training to be easier. In addition, the signals in a compressed video provide
free, albeit noisy, motion information. We propose novel techniques to use them
effectively. Our approach is about 4.6 times faster than Res3D and 2.7 times
faster than ResNet-152. On the task of action recognition, our approach
outperforms all the other methods on the UCF-101, HMDB-51, and Charades
dataset.Comment: CVPR 2018 (Selected for spotlight presentation
Perceptual Perspective Taking and Action Recognition
Robots that operate in social environments need to be able to recognise and understand the actions of other robots, and humans, in order to facilitate learning through imitation and collaboration. The success of the simulation theory approach to action recognition and imitation relies on the ability to take the perspective of other people, so as to generate simulated actions from their point of view. In this paper, simulation of visual perception is used to re-create the visual egocentric sensory space and egocentric behaviour space of an observed agent, and through this increase the accuracy of action recognition. To demonstrate the approach, experiments are performed with a robot attributing perceptions to and recognising the actions of a second robot
- …