31,823 research outputs found
Hidden Two-Stream Convolutional Networks for Action Recognition
Analyzing videos of human actions involves understanding the temporal
relationships among video frames. State-of-the-art action recognition
approaches rely on traditional optical flow estimation methods to pre-compute
motion information for CNNs. Such a two-stage approach is computationally
expensive, storage demanding, and not end-to-end trainable. In this paper, we
present a novel CNN architecture that implicitly captures motion information
between adjacent frames. We name our approach hidden two-stream CNNs because it
only takes raw video frames as input and directly predicts action classes
without explicitly computing optical flow. Our end-to-end approach is 10x
faster than its two-stage baseline. Experimental results on four challenging
action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show
that our approach significantly outperforms the previous best real-time
approaches.Comment: Accepted at ACCV 2018, camera ready. Code available at
https://github.com/bryanyzhu/Hidden-Two-Strea
Siamese Instance Search for Tracking
In this paper we present a tracker, which is radically different from
state-of-the-art trackers: we apply no model updating, no occlusion detection,
no combination of trackers, no geometric matching, and still deliver
state-of-the-art tracking performance, as demonstrated on the popular online
tracking benchmark (OTB) and six very challenging YouTube videos. The presented
tracker simply matches the initial patch of the target in the first frame with
candidates in a new frame and returns the most similar patch by a learned
matching function. The strength of the matching function comes from being
extensively trained generically, i.e., without any data of the target, using a
Siamese deep neural network, which we design for tracking. Once learned, the
matching function is used as is, without any adapting, to track previously
unseen targets. It turns out that the learned matching function is so powerful
that a simple tracker built upon it, coined Siamese INstance search Tracker,
SINT, which only uses the original observation of the target from the first
frame, suffices to reach state-of-the-art performance. Further, we show the
proposed tracker even allows for target re-identification after the target was
absent for a complete video shot.Comment: This paper is accepted to the IEEE Conference on Computer Vision and
Pattern Recognition, 201
- …