111,371 research outputs found
Two-Stream Convolutional Networks for Action Recognition in Videos
We investigate architectures of discriminatively trained deep Convolutional
Networks (ConvNets) for action recognition in video. The challenge is to
capture the complementary information on appearance from still frames and
motion between frames. We also aim to generalise the best performing
hand-crafted features within a data-driven learning framework.
Our contribution is three-fold. First, we propose a two-stream ConvNet
architecture which incorporates spatial and temporal networks. Second, we
demonstrate that a ConvNet trained on multi-frame dense optical flow is able to
achieve very good performance in spite of limited training data. Finally, we
show that multi-task learning, applied to two different action classification
datasets, can be used to increase the amount of training data and improve the
performance on both.
Our architecture is trained and evaluated on the standard video actions
benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of
the art. It also exceeds by a large margin previous attempts to use deep nets
for video classification
End-to-End Learning of Video Super-Resolution with Motion Compensation
Learning approaches have shown great success in the task of super-resolving
an image given a low resolution input. Video super-resolution aims for
exploiting additionally the information from multiple images. Typically, the
images are related via optical flow and consecutive image warping. In this
paper, we provide an end-to-end video super-resolution network that, in
contrast to previous works, includes the estimation of optical flow in the
overall network architecture. We analyze the usage of optical flow for video
super-resolution and find that common off-the-shelf image warping does not
allow video super-resolution to benefit much from optical flow. We rather
propose an operation for motion compensation that performs warping from low to
high resolution directly. We show that with this network configuration, video
super-resolution can benefit from optical flow and we obtain state-of-the-art
results on the popular test sets. We also show that the processing of whole
images rather than independent patches is responsible for a large increase in
accuracy.Comment: Accepted to GCPR201
Integrated 2-D Optical Flow Sensor
I present a new focal-plane analog VLSI sensor that estimates optical flow in two visual dimensions. The chip significantly improves previous approaches both with respect to the applied model of optical flow estimation as well as the actual hardware implementation. Its distributed computational architecture consists of an array of locally connected motion units that collectively solve for the unique optimal optical flow estimate. The novel gradient-based motion model assumes visual motion to be translational, smooth and biased. The model guarantees that the estimation problem is computationally well-posed regardless of the visual input. Model parameters can be globally adjusted, leading to a rich output behavior. Varying the smoothness strength, for example, can provide a continuous spectrum of motion estimates, ranging from normal to global optical flow. Unlike approaches that rely on the explicit matching of brightness edges in space or time, the applied gradient-based model assures spatiotemporal continuity on visual information. The non-linear coupling of the individual motion units improves the resulting optical flow estimate because it reduces spatial smoothing across large velocity differences. Extended measurements of a 30x30 array prototype sensor under real-world conditions demonstrate the validity of the model and the robustness and functionality of the implementation
An improved 2D optical flow sensor for motion segmentation
A functional focal-plane implementation of a 2D optical flow system is presented that detects an
preserves motion discontinuities. The system is composed of two different network layers of analog
computational units arranged in a retinotopical order. The units in the first layer (the optical
flow network) estimate the local optical flow field in two visual dimensions, where the strength
of their nearest-neighbor connections determines the amount of motion integration. Whereas in an
earlier implementation \cite{Stocker_Douglas99} the connection strength was set constant in the
complete image space, it is now \emph{dynamically and locally} controlled by the second network
layer (the motion discontinuities network) that is recurrently connected to the optical flow
network. The connection strengths in the optical flow network are modulated such that visual
motion integration is ideally only facilitated within image areas that are likely to represent
common
motion sources.
Results of an experimental aVLSI chip illustrate the potential of the approach and its
functionality under real-world conditions
- β¦