1,154 research outputs found
The TRECVID 2007 BBC rushes summarization evaluation pilot
This paper provides an overview of a pilot evaluation of
video summaries using rushes from several BBC dramatic series. It was carried out under the auspices of TRECVID.
Twenty-two research teams submitted video summaries of
up to 4% duration, of 42 individual rushes video files aimed
at compressing out redundant and insignificant material.
The output of two baseline systems built on straightforward
content reduction techniques was contributed by Carnegie
Mellon University as a control. Procedures for developing
ground truth lists of important segments from each video
were developed at Dublin City University and applied to
the BBC video. At NIST each summary was judged by
three humans with respect to how much of the ground truth
was included, how easy the summary was to understand,
and how much repeated material the summary contained.
Additional objective measures included: how long it took
the system to create the summary, how long it took the assessor to judge it against the ground truth, and what the
summary's duration was. Assessor agreement on finding desired segments averaged 78% and results indicate that while it is difficult to exceed the performance of baselines, a few systems did
Hidden Two-Stream Convolutional Networks for Action Recognition
Analyzing videos of human actions involves understanding the temporal
relationships among video frames. State-of-the-art action recognition
approaches rely on traditional optical flow estimation methods to pre-compute
motion information for CNNs. Such a two-stage approach is computationally
expensive, storage demanding, and not end-to-end trainable. In this paper, we
present a novel CNN architecture that implicitly captures motion information
between adjacent frames. We name our approach hidden two-stream CNNs because it
only takes raw video frames as input and directly predicts action classes
without explicitly computing optical flow. Our end-to-end approach is 10x
faster than its two-stage baseline. Experimental results on four challenging
action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show
that our approach significantly outperforms the previous best real-time
approaches.Comment: Accepted at ACCV 2018, camera ready. Code available at
https://github.com/bryanyzhu/Hidden-Two-Strea
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition
Motion has shown to be useful for video understanding, where motion is
typically represented by optical flow. However, computing flow from video
frames is very time-consuming. Recent works directly leverage the motion
vectors and residuals readily available in the compressed video to represent
motion at no cost. While this avoids flow computation, it also hurts accuracy
since the motion vector is noisy and has substantially reduced resolution,
which makes it a less discriminative motion representation. To remedy these
issues, we propose a lightweight generator network, which reduces noises in
motion vectors and captures fine motion details, achieving a more
Discriminative Motion Cue (DMC) representation. Since optical flow is a more
accurate motion representation, we train the DMC generator to approximate flow
using a reconstruction loss and a generative adversarial loss, jointly with the
downstream action classification task. Extensive evaluations on three action
recognition benchmarks (HMDB-51, UCF-101, and a subset of Kinetics) confirm the
effectiveness of our method. Our full system, consisting of the generator and
the classifier, is coined as DMC-Net which obtains high accuracy close to that
of using flow and runs two orders of magnitude faster than using optical flow
at inference time.Comment: Accepted by CVPR'1
- …