3,479 research outputs found
Hidden Two-Stream Convolutional Networks for Action Recognition
Analyzing videos of human actions involves understanding the temporal
relationships among video frames. State-of-the-art action recognition
approaches rely on traditional optical flow estimation methods to pre-compute
motion information for CNNs. Such a two-stage approach is computationally
expensive, storage demanding, and not end-to-end trainable. In this paper, we
present a novel CNN architecture that implicitly captures motion information
between adjacent frames. We name our approach hidden two-stream CNNs because it
only takes raw video frames as input and directly predicts action classes
without explicitly computing optical flow. Our end-to-end approach is 10x
faster than its two-stage baseline. Experimental results on four challenging
action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show
that our approach significantly outperforms the previous best real-time
approaches.Comment: Accepted at ACCV 2018, camera ready. Code available at
https://github.com/bryanyzhu/Hidden-Two-Strea
DDFlow: Learning Optical Flow with Unlabeled Data Distillation
We present DDFlow, a data distillation approach to learning optical flow
estimation from unlabeled data. The approach distills reliable predictions from
a teacher network, and uses these predictions as annotations to guide a student
network to learn optical flow. Unlike existing work relying on hand-crafted
energy terms to handle occlusion, our approach is data-driven, and learns
optical flow for occluded pixels. This enables us to train our model with a
much simpler loss function, and achieve a much higher accuracy. We conduct a
rigorous evaluation on the challenging Flying Chairs, MPI Sintel, KITTI 2012
and 2015 benchmarks, and show that our approach significantly outperforms all
existing unsupervised learning methods, while running at real time.Comment: 8 pages, AAAI 1
FlowNet: Learning Optical Flow with Convolutional Networks
Convolutional neural networks (CNNs) have recently been very successful in a
variety of computer vision tasks, especially on those linked to recognition.
Optical flow estimation has not been among the tasks where CNNs were
successful. In this paper we construct appropriate CNNs which are capable of
solving the optical flow estimation problem as a supervised learning task. We
propose and compare two architectures: a generic architecture and another one
including a layer that correlates feature vectors at different image locations.
Since existing ground truth data sets are not sufficiently large to train a
CNN, we generate a synthetic Flying Chairs dataset. We show that networks
trained on this unrealistic data still generalize very well to existing
datasets such as Sintel and KITTI, achieving competitive accuracy at frame
rates of 5 to 10 fps.Comment: Added supplementary materia
Occlusion Aware Unsupervised Learning of Optical Flow
It has been recently shown that a convolutional neural network can learn
optical flow estimation with unsupervised learning. However, the performance of
the unsupervised methods still has a relatively large gap compared to its
supervised counterpart. Occlusion and large motion are some of the major
factors that limit the current unsupervised learning of optical flow methods.
In this work we introduce a new method which models occlusion explicitly and a
new warping way that facilitates the learning of large motion. Our method shows
promising results on Flying Chairs, MPI-Sintel and KITTI benchmark datasets.
Especially on KITTI dataset where abundant unlabeled samples exist, our
unsupervised method outperforms its counterpart trained with supervised
learning.Comment: CVPR 2018 Camera-read
Unsupervised Learning of Edges
Data-driven approaches for edge detection have proven effective and achieve
top results on modern benchmarks. However, all current data-driven edge
detectors require manual supervision for training in the form of hand-labeled
region segments or object boundaries. Specifically, human annotators mark
semantically meaningful edges which are subsequently used for training. Is this
form of strong, high-level supervision actually necessary to learn to
accurately detect edges? In this work we present a simple yet effective
approach for training edge detectors without human supervision. To this end we
utilize motion, and more specifically, the only input to our method is noisy
semi-dense matches between frames. We begin with only a rudimentary knowledge
of edges (in the form of image gradients), and alternate between improving
motion estimation and edge detection in turn. Using a large corpus of video
data, we show that edge detectors trained using our unsupervised scheme
approach the performance of the same methods trained with full supervision
(within 3-5%). Finally, we show that when using a deep network for the edge
detector, our approach provides a novel pre-training scheme for object
detection.Comment: Camera ready version for CVPR 201
- …