11,534 research outputs found
Compensating for motion estimation inaccuracies in DVC
Distributed video coding is a relatively new video coding approach, where compression is achieved by performing motion estimation at the decoder. Current techniques for decoder-side motion estimation make use of assumptions such as linear motion between the reference frames. It is only after the frame is partially decoded that some of the errors are corrected. In this paper, we propose a new approach with multiple predictors, accounting for inaccuracies in the decoder-side motion estimation process during the decoding. Each of the predictors is assigned a weight, and the correlation between the original frame at the encoder and the set of predictors at the decoder is modeled at the decoder. This correlation information is then used during the decoding process. Results indicate average quality gains up to 0.4 dB
Deep Video Generation, Prediction and Completion of Human Action Sequences
Current deep learning results on video generation are limited while there are
only a few first results on video prediction and no relevant significant
results on video completion. This is due to the severe ill-posedness inherent
in these three problems. In this paper, we focus on human action videos, and
propose a general, two-stage deep framework to generate human action videos
with no constraints or arbitrary number of constraints, which uniformly address
the three problems: video generation given no input frames, video prediction
given the first few frames, and video completion given the first and last
frames. To make the problem tractable, in the first stage we train a deep
generative model that generates a human pose sequence from random noise. In the
second stage, a skeleton-to-image network is trained, which is used to generate
a human action video given the complete human pose sequence generated in the
first stage. By introducing the two-stage strategy, we sidestep the original
ill-posed problems while producing for the first time high-quality video
generation/prediction/completion results of much longer duration. We present
quantitative and qualitative evaluation to show that our two-stage approach
outperforms state-of-the-art methods in video generation, prediction and video
completion. Our video result demonstration can be viewed at
https://iamacewhite.github.io/supp/index.htmlComment: Under review for CVPR 2018. Haoye and Chunyan have equal contributio
Fast Shadow Detection from a Single Image Using a Patched Convolutional Neural Network
In recent years, various shadow detection methods from a single image have
been proposed and used in vision systems; however, most of them are not
appropriate for the robotic applications due to the expensive time complexity.
This paper introduces a fast shadow detection method using a deep learning
framework, with a time cost that is appropriate for robotic applications. In
our solution, we first obtain a shadow prior map with the help of multi-class
support vector machine using statistical features. Then, we use a semantic-
aware patch-level Convolutional Neural Network that efficiently trains on
shadow examples by combining the original image and the shadow prior map.
Experiments on benchmark datasets demonstrate the proposed method significantly
decreases the time complexity of shadow detection, by one or two orders of
magnitude compared with state-of-the-art methods, without losing accuracy.Comment: 6 pages, 5 figures, Submitted to IROS 201
DeMoN: Depth and Motion Network for Learning Monocular Stereo
In this paper we formulate structure from motion as a learning problem. We
train a convolutional network end-to-end to compute depth and camera motion
from successive, unconstrained image pairs. The architecture is composed of
multiple stacked encoder-decoder networks, the core part being an iterative
network that is able to improve its own predictions. The network estimates not
only depth and motion, but additionally surface normals, optical flow between
the images and confidence of the matching. A crucial component of the approach
is a training loss based on spatial relative differences. Compared to
traditional two-frame structure from motion methods, results are more accurate
and more robust. In contrast to the popular depth-from-single-image networks,
DeMoN learns the concept of matching and, thus, better generalizes to
structures not seen during training.Comment: Camera ready version for CVPR 2017. Supplementary material included.
Project page:
http://lmb.informatik.uni-freiburg.de/people/ummenhof/depthmotionnet
- …