Search CORE

30,636 research outputs found

Compressed Video Action Recognition

Author: Hu Hexiang
Krähenbühl Philipp
Manmatha R.
Smola Alexander J.
Wu Chao-Yuan
Zaheer Manzil
Publication venue
Publication date: 29/03/2018
Field of study

Training robust deep video representations has proven to be much more challenging than learning deep image representations. This is in part due to the enormous size of raw video streams and the high temporal redundancy; the true and interesting signal is often drowned in too much irrelevant data. Motivated by that the superfluous information can be reduced by up to two orders of magnitude by video compression (using H.264, HEVC, etc.), we propose to train a deep network directly on the compressed video. This representation has a higher information density, and we found the training to be easier. In addition, the signals in a compressed video provide free, albeit noisy, motion information. We propose novel techniques to use them effectively. Our approach is about 4.6 times faster than Res3D and 2.7 times faster than ResNet-152. On the task of action recognition, our approach outperforms all the other methods on the UCF-101, HMDB-51, and Charades dataset.Comment: CVPR 2018 (Selected for spotlight presentation

arXiv.org e-Print Archive

Crossref

Long-term Tracking in the Wild: A Benchmark

Author: A Li
AWM Smeulders
DA Ross
JF Henriques
L Bertinetto
L Wasserman
M Kristan
M Mueller
O Russakovsky
P Liang
T Fawcett
Y Wu
Z Kalal
Publication venue
Publication date: 01/01/2018
Field of study

We introduce the OxUvA dataset and benchmark for evaluating single-object tracking algorithms. Benchmarks have enabled great strides in the field of object tracking by defining standardized evaluations on large sets of diverse videos. However, these works have focused exclusively on sequences that are just tens of seconds in length and in which the target is always visible. Consequently, most researchers have designed methods tailored to this "short-term" scenario, which is poorly representative of practitioners' needs. Aiming to address this disparity, we compile a long-term, large-scale tracking dataset of sequences with average length greater than two minutes and with frequent target object disappearance. The OxUvA dataset is much larger than the object tracking datasets of recent years: it comprises 366 sequences spanning 14 hours of video. We assess the performance of several algorithms, considering both the ability to locate the target and to determine whether it is present or absent. Our goal is to offer the community a large and diverse benchmark to enable the design and evaluation of tracking methods ready to be used "in the wild". The project website is http://oxuva.netComment: To appear at ECCV 201

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

International Migration, Integration and Social Cohesion online publications