18 research outputs found
Deep Optical Flow Estimation Via Multi-Scale Correspondence Structure Learning
As an important and challenging problem in computer vision, learning based
optical flow estimation aims to discover the intrinsic correspondence structure
between two adjacent video frames through statistical learning. Therefore, a
key issue to solve in this area is how to effectively model the multi-scale
correspondence structure properties in an adaptive end-to-end learning fashion.
Motivated by this observation, we propose an end-to-end multi-scale
correspondence structure learning (MSCSL) approach for optical flow estimation.
In principle, the proposed MSCSL approach is capable of effectively capturing
the multi-scale inter-image-correlation correspondence structures within a
multi-level feature space from deep learning. Moreover, the proposed MSCSL
approach builds a spatial Conv-GRU neural network model to adaptively model the
intrinsic dependency relationships among these multi-scale correspondence
structures. Finally, the above procedures for correspondence structure learning
and multi-scale dependency modeling are implemented in a unified end-to-end
deep learning framework. Experimental results on several benchmark datasets
demonstrate the effectiveness of the proposed approach.Comment: 7 pages, 3 figures, 2 table
EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras
Event-based cameras have shown great promise in a variety of situations where
frame based cameras suffer, such as high speed motions and high dynamic range
scenes. However, developing algorithms for event measurements requires a new
class of hand crafted algorithms. Deep learning has shown great success in
providing model free solutions to many problems in the vision community, but
existing networks have been developed with frame based images in mind, and
there does not exist the wealth of labeled data for events as there does for
images for supervised training. To these points, we present EV-FlowNet, a novel
self-supervised deep learning pipeline for optical flow estimation for event
based cameras. In particular, we introduce an image based representation of a
given event stream, which is fed into a self-supervised neural network as the
sole input. The corresponding grayscale images captured from the same camera at
the same time as the events are then used as a supervisory signal to provide a
loss function at training time, given the estimated flow from the network. We
show that the resulting network is able to accurately predict optical flow from
events only in a variety of different scenes, with performance competitive to
image based networks. This method not only allows for accurate estimation of
dense optical flow, but also provides a framework for the transfer of other
self-supervised methods to the event-based domain.Comment: 9 pages, 5 figures, 1 table. Accompanying video:
https://youtu.be/eMHZBSoq0sE. Dataset:
https://daniilidis-group.github.io/mvsec/, Robotics: Science and Systems 201
Hidden Two-Stream Convolutional Networks for Action Recognition
Analyzing videos of human actions involves understanding the temporal
relationships among video frames. State-of-the-art action recognition
approaches rely on traditional optical flow estimation methods to pre-compute
motion information for CNNs. Such a two-stage approach is computationally
expensive, storage demanding, and not end-to-end trainable. In this paper, we
present a novel CNN architecture that implicitly captures motion information
between adjacent frames. We name our approach hidden two-stream CNNs because it
only takes raw video frames as input and directly predicts action classes
without explicitly computing optical flow. Our end-to-end approach is 10x
faster than its two-stage baseline. Experimental results on four challenging
action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show
that our approach significantly outperforms the previous best real-time
approaches.Comment: Accepted at ACCV 2018, camera ready. Code available at
https://github.com/bryanyzhu/Hidden-Two-Strea
Large-Scale Mapping of Human Activity using Geo-Tagged Videos
This paper is the first work to perform spatio-temporal mapping of human
activity using the visual content of geo-tagged videos. We utilize a recent
deep-learning based video analysis framework, termed hidden two-stream
networks, to recognize a range of activities in YouTube videos. This framework
is efficient and can run in real time or faster which is important for
recognizing events as they occur in streaming video or for reducing latency in
analyzing already captured video. This is, in turn, important for using video
in smart-city applications. We perform a series of experiments to show our
approach is able to accurately map activities both spatially and temporally. We
also demonstrate the advantages of using the visual content over the
tags/titles.Comment: Accepted at ACM SIGSPATIAL 201