1,382 research outputs found

    Occlusion Aware Unsupervised Learning of Optical Flow

    Full text link
    It has been recently shown that a convolutional neural network can learn optical flow estimation with unsupervised learning. However, the performance of the unsupervised methods still has a relatively large gap compared to its supervised counterpart. Occlusion and large motion are some of the major factors that limit the current unsupervised learning of optical flow methods. In this work we introduce a new method which models occlusion explicitly and a new warping way that facilitates the learning of large motion. Our method shows promising results on Flying Chairs, MPI-Sintel and KITTI benchmark datasets. Especially on KITTI dataset where abundant unlabeled samples exist, our unsupervised method outperforms its counterpart trained with supervised learning.Comment: CVPR 2018 Camera-read

    MRF Stereo Matching with Statistical Estimation of Parameters

    Get PDF
    For about the last ten years, stereo matching in computer vision has been treated as a combinatorial optimization problem. Assuming that the points in stereo images form a Markov Random Field (MRF), a variety of combinatorial optimization algorithms has been developed to optimize their underlying cost functions. In many of these algorithms, the MRF parameters of the cost functions have often been manually tuned or heuristically determined for achieving good performance results. Recently, several algorithms for statistical, hence, automatic estimation of the parameters have been published. Overall, these algorithms perform well in labeling, but they lack in performance for handling discontinuity in labeling along the surface borders. In this dissertation, we develop an algorithm for optimization of the cost function with automatic estimation of the MRF parameters – the data and smoothness parameters. Both the parameters are estimated statistically and applied in the cost function with support of adaptive neighborhood defined based on color similarity. With the proposed algorithm, discontinuity handling with higher consistency than of the existing algorithms is achieved along surface borders. The data parameters are pre-estimated from one of the stereo images by applying a hypothesis, called noise equivalence hypothesis, to eliminate interdependency between the estimations of the data and smoothness parameters. The smoothness parameters are estimated applying a combination of maximum likelihood and disparity gradient constraint, to eliminate nested inference for the estimation. The parameters for handling discontinuities in data and smoothness are defined statistically as well. We model cost functions to match the images symmetrically for improved matching performance and also to detect occlusions. Finally, we fill the occlusions in the disparity map by applying several existing and proposed algorithms and show that our best proposed segmentation based least squares algorithm performs better than the existing algorithms. We conduct experiments with the proposed algorithm on publicly available ground truth test datasets provided by the Middlebury College. Experiments show that results better than the existing algorithms’ are delivered by the proposed algorithm having the MRF parameters estimated automatically. In addition, applying the parameter estimation technique in existing stereo matching algorithm, we observe significant improvement in computational time

    DIGITAL INPAINTING ALGORITHMS AND EVALUATION

    Get PDF
    Digital inpainting is the technique of filling in the missing regions of an image or a video using information from surrounding area. This technique has found widespread use in applications such as restoration, error recovery, multimedia editing, and video privacy protection. This dissertation addresses three significant challenges associated with the existing and emerging inpainting algorithms and applications. The three key areas of impact are 1) Structure completion for image inpainting algorithms, 2) Fast and efficient object based video inpainting framework and 3) Perceptual evaluation of large area image inpainting algorithms. One of the main approach of existing image inpainting algorithms in completing the missing information is to follow a two stage process. A structure completion step, to complete the boundaries of regions in the hole area, followed by texture completion process using advanced texture synthesis methods. While the texture synthesis stage is important, it can be argued that structure completion aspect is a vital component in improving the perceptual image inpainting quality. To this end, we introduce a global structure completion algorithm for completion of missing boundaries using symmetry as the key feature. While existing methods for symmetry completion require a-priori information, our method takes a non-parametric approach by utilizing the invariant nature of curvature to complete missing boundaries. Turning our attention from image to video inpainting, we readily observe that existing video inpainting techniques have evolved as an extension of image inpainting techniques. As a result, they suffer from various shortcoming including, among others, inability to handle large missing spatio-temporal regions, significantly slow execution time making it impractical for interactive use and presence of temporal and spatial artifacts. To address these major challenges, we propose a fundamentally different method based on object based framework for improving the performance of video inpainting algorithms. We introduce a modular inpainting scheme in which we first segment the video into constituent objects by using acquired background models followed by inpainting of static background regions and dynamic foreground regions. For static background region inpainting, we use a simple background replacement and occasional image inpainting. To inpaint dynamic moving foreground regions, we introduce a novel sliding-window based dissimilarity measure in a dynamic programming framework. This technique can effectively inpaint large regions of occlusions, inpaint objects that are completely missing for several frames, change in size and pose and has minimal blurring and motion artifacts. Finally we direct our focus on experimental studies related to perceptual quality evaluation of large area image inpainting algorithms. The perceptual quality of large area inpainting technique is inherently a subjective process and yet no previous research has been carried out by taking the subjective nature of the Human Visual System (HVS). We perform subjective experiments using eye-tracking device involving 24 subjects to analyze the effect of inpainting on human gaze. We experimentally show that the presence of inpainting artifacts directly impacts the gaze of an unbiased observer and this in effect has a direct bearing on the subjective rating of the observer. Specifically, we show that the gaze energy in the hole regions of an inpainted image show marked deviations from normal behavior when the inpainting artifacts are readily apparent

    Context-aware Synthesis for Video Frame Interpolation

    Get PDF
    Video frame interpolation algorithms typically estimate optical flow or its variations and then use it to guide the synthesis of an intermediate frame between two consecutive original frames. To handle challenges like occlusion, bidirectional flow between the two input frames is often estimated and used to warp and blend the input frames. However, how to effectively blend the two warped frames still remains a challenging problem. This paper presents a context-aware synthesis approach that warps not only the input frames but also their pixel-wise contextual information and uses them to interpolate a high-quality intermediate frame. Specifically, we first use a pre-trained neural network to extract per-pixel contextual information for input frames. We then employ a state-of-the-art optical flow algorithm to estimate bidirectional flow between them and pre-warp both input frames and their context maps. Finally, unlike common approaches that blend the pre-warped frames, our method feeds them and their context maps to a video frame synthesis neural network to produce the interpolated frame in a context-aware fashion. Our neural network is fully convolutional and is trained end to end. Our experiments show that our method can handle challenging scenarios such as occlusion and large motion and outperforms representative state-of-the-art approaches.Comment: CVPR 2018, http://graphics.cs.pdx.edu/project/ctxsy

    Single View Reconstruction for Human Face and Motion with Priors

    Get PDF
    Single view reconstruction is fundamentally an under-constrained problem. We aim to develop new approaches to model human face and motion with model priors that restrict the space of possible solutions. First, we develop a novel approach to recover the 3D shape from a single view image under challenging conditions, such as large variations in illumination and pose. The problem is addressed by employing the techniques of non-linear manifold embedding and alignment. Specifically, the local image models for each patch of facial images and the local surface models for each patch of 3D shape are learned using a non-linear dimensionality reduction technique, and the correspondences between these local models are then learned by a manifold alignment method. Local models successfully remove the dependency of large training databases for human face modeling. By combining the local shapes, the global shape of a face can be reconstructed directly from a single linear system of equations via least square. Unfortunately, this learning-based approach cannot be successfully applied to the problem of human motion modeling due to the internal and external variations in single view video-based marker-less motion capture. Therefore, we introduce a new model-based approach for capturing human motion using a stream of depth images from a single depth sensor. While a depth sensor provides metric 3D information, using a single sensor, instead of a camera array, results in a view-dependent and incomplete measurement of object motion. We develop a novel two-stage template fitting algorithm that is invariant to subject size and view-point variations, and robust to occlusions. Starting from a known pose, our algorithm first estimates a body configuration through temporal registration, which is used to search the template motion database for a best match. The best match body configuration as well as its corresponding surface mesh model are deformed to fit the input depth map, filling in the part that is occluded from the input and compensating for differences in pose and body-size between the input image and the template. Our approach does not require any makers, user-interaction, or appearance-based tracking. Experiments show that our approaches can achieve good modeling results for human face and motion, and are capable of dealing with variety of challenges in single view reconstruction, e.g., occlusion
    • …
    corecore