48,365 research outputs found
Location Dependency in Video Prediction
Deep convolutional neural networks are used to address many computer vision
problems, including video prediction. The task of video prediction requires
analyzing the video frames, temporally and spatially, and constructing a model
of how the environment evolves. Convolutional neural networks are spatially
invariant, though, which prevents them from modeling location-dependent
patterns. In this work, the authors propose location-biased convolutional
layers to overcome this limitation. The effectiveness of location bias is
evaluated on two architectures: Video Ladder Network (VLN) and Convolutional
redictive Gating Pyramid (Conv-PGP). The results indicate that encoding
location-dependent features is crucial for the task of video prediction. Our
proposed methods significantly outperform spatially invariant models.Comment: International Conference on Artificial Neural Networks. Springer,
Cham, 201
Deep Optical Flow Estimation Via Multi-Scale Correspondence Structure Learning
As an important and challenging problem in computer vision, learning based
optical flow estimation aims to discover the intrinsic correspondence structure
between two adjacent video frames through statistical learning. Therefore, a
key issue to solve in this area is how to effectively model the multi-scale
correspondence structure properties in an adaptive end-to-end learning fashion.
Motivated by this observation, we propose an end-to-end multi-scale
correspondence structure learning (MSCSL) approach for optical flow estimation.
In principle, the proposed MSCSL approach is capable of effectively capturing
the multi-scale inter-image-correlation correspondence structures within a
multi-level feature space from deep learning. Moreover, the proposed MSCSL
approach builds a spatial Conv-GRU neural network model to adaptively model the
intrinsic dependency relationships among these multi-scale correspondence
structures. Finally, the above procedures for correspondence structure learning
and multi-scale dependency modeling are implemented in a unified end-to-end
deep learning framework. Experimental results on several benchmark datasets
demonstrate the effectiveness of the proposed approach.Comment: 7 pages, 3 figures, 2 table
Lossless Intra Coding in HEVC with 3-tap Filters
This paper presents a pixel-by-pixel spatial prediction method for lossless
intra coding within High Efficiency Video Coding (HEVC). A well-known previous
pixel-by-pixel spatial prediction method uses only two neighboring pixels for
prediction, based on the angular projection idea borrowed from block-based
intra prediction in lossy coding. This paper explores a method which uses three
neighboring pixels for prediction according to a two-dimensional correlation
model, and the used neighbor pixels and prediction weights change depending on
intra mode. To find the best prediction weights for each intra mode, a
two-stage offline optimization algorithm is used and a number of implementation
aspects are discussed to simplify the proposed prediction method. The proposed
method is implemented in the HEVC reference software and experimental results
show that the explored 3-tap filtering method can achieve an average 11.34%
bitrate reduction over the default lossless intra coding in HEVC. The proposed
method also decreases average decoding time by 12.7% while it increases average
encoding time by 9.7%Comment: 10 pages, 7 figure
Recurrent 3D Pose Sequence Machines
3D human articulated pose recovery from monocular image sequences is very
challenging due to the diverse appearances, viewpoints, occlusions, and also
the human 3D pose is inherently ambiguous from the monocular imagery. It is
thus critical to exploit rich spatial and temporal long-range dependencies
among body joints for accurate 3D pose sequence prediction. Existing approaches
usually manually design some elaborate prior terms and human body kinematic
constraints for capturing structures, which are often insufficient to exploit
all intrinsic structures and not scalable for all scenarios. In contrast, this
paper presents a Recurrent 3D Pose Sequence Machine(RPSM) to automatically
learn the image-dependent structural constraint and sequence-dependent temporal
context by using a multi-stage sequential refinement. At each stage, our RPSM
is composed of three modules to predict the 3D pose sequences based on the
previously learned 2D pose representations and 3D poses: (i) a 2D pose module
extracting the image-dependent pose representations, (ii) a 3D pose recurrent
module regressing 3D poses and (iii) a feature adaption module serving as a
bridge between module (i) and (ii) to enable the representation transformation
from 2D to 3D domain. These three modules are then assembled into a sequential
prediction framework to refine the predicted poses with multiple recurrent
stages. Extensive evaluations on the Human3.6M dataset and HumanEva-I dataset
show that our RPSM outperforms all state-of-the-art approaches for 3D pose
estimation.Comment: Published in CVPR 201
- …