48,365 research outputs found

    Location Dependency in Video Prediction

    Full text link
    Deep convolutional neural networks are used to address many computer vision problems, including video prediction. The task of video prediction requires analyzing the video frames, temporally and spatially, and constructing a model of how the environment evolves. Convolutional neural networks are spatially invariant, though, which prevents them from modeling location-dependent patterns. In this work, the authors propose location-biased convolutional layers to overcome this limitation. The effectiveness of location bias is evaluated on two architectures: Video Ladder Network (VLN) and Convolutional redictive Gating Pyramid (Conv-PGP). The results indicate that encoding location-dependent features is crucial for the task of video prediction. Our proposed methods significantly outperform spatially invariant models.Comment: International Conference on Artificial Neural Networks. Springer, Cham, 201

    Deep Optical Flow Estimation Via Multi-Scale Correspondence Structure Learning

    Full text link
    As an important and challenging problem in computer vision, learning based optical flow estimation aims to discover the intrinsic correspondence structure between two adjacent video frames through statistical learning. Therefore, a key issue to solve in this area is how to effectively model the multi-scale correspondence structure properties in an adaptive end-to-end learning fashion. Motivated by this observation, we propose an end-to-end multi-scale correspondence structure learning (MSCSL) approach for optical flow estimation. In principle, the proposed MSCSL approach is capable of effectively capturing the multi-scale inter-image-correlation correspondence structures within a multi-level feature space from deep learning. Moreover, the proposed MSCSL approach builds a spatial Conv-GRU neural network model to adaptively model the intrinsic dependency relationships among these multi-scale correspondence structures. Finally, the above procedures for correspondence structure learning and multi-scale dependency modeling are implemented in a unified end-to-end deep learning framework. Experimental results on several benchmark datasets demonstrate the effectiveness of the proposed approach.Comment: 7 pages, 3 figures, 2 table

    Lossless Intra Coding in HEVC with 3-tap Filters

    Full text link
    This paper presents a pixel-by-pixel spatial prediction method for lossless intra coding within High Efficiency Video Coding (HEVC). A well-known previous pixel-by-pixel spatial prediction method uses only two neighboring pixels for prediction, based on the angular projection idea borrowed from block-based intra prediction in lossy coding. This paper explores a method which uses three neighboring pixels for prediction according to a two-dimensional correlation model, and the used neighbor pixels and prediction weights change depending on intra mode. To find the best prediction weights for each intra mode, a two-stage offline optimization algorithm is used and a number of implementation aspects are discussed to simplify the proposed prediction method. The proposed method is implemented in the HEVC reference software and experimental results show that the explored 3-tap filtering method can achieve an average 11.34% bitrate reduction over the default lossless intra coding in HEVC. The proposed method also decreases average decoding time by 12.7% while it increases average encoding time by 9.7%Comment: 10 pages, 7 figure

    Recurrent 3D Pose Sequence Machines

    Full text link
    3D human articulated pose recovery from monocular image sequences is very challenging due to the diverse appearances, viewpoints, occlusions, and also the human 3D pose is inherently ambiguous from the monocular imagery. It is thus critical to exploit rich spatial and temporal long-range dependencies among body joints for accurate 3D pose sequence prediction. Existing approaches usually manually design some elaborate prior terms and human body kinematic constraints for capturing structures, which are often insufficient to exploit all intrinsic structures and not scalable for all scenarios. In contrast, this paper presents a Recurrent 3D Pose Sequence Machine(RPSM) to automatically learn the image-dependent structural constraint and sequence-dependent temporal context by using a multi-stage sequential refinement. At each stage, our RPSM is composed of three modules to predict the 3D pose sequences based on the previously learned 2D pose representations and 3D poses: (i) a 2D pose module extracting the image-dependent pose representations, (ii) a 3D pose recurrent module regressing 3D poses and (iii) a feature adaption module serving as a bridge between module (i) and (ii) to enable the representation transformation from 2D to 3D domain. These three modules are then assembled into a sequential prediction framework to refine the predicted poses with multiple recurrent stages. Extensive evaluations on the Human3.6M dataset and HumanEva-I dataset show that our RPSM outperforms all state-of-the-art approaches for 3D pose estimation.Comment: Published in CVPR 201
    • …
    corecore