32,737 research outputs found

    Learning how to be robust: Deep polynomial regression

    Get PDF
    Polynomial regression is a recurrent problem with a large number of applications. In computer vision it often appears in motion analysis. Whatever the application, standard methods for regression of polynomial models tend to deliver biased results when the input data is heavily contaminated by outliers. Moreover, the problem is even harder when outliers have strong structure. Departing from problem-tailored heuristics for robust estimation of parametric models, we explore deep convolutional neural networks. Our work aims to find a generic approach for training deep regression models without the explicit need of supervised annotation. We bypass the need for a tailored loss function on the regression parameters by attaching to our model a differentiable hard-wired decoder corresponding to the polynomial operation at hand. We demonstrate the value of our findings by comparing with standard robust regression methods. Furthermore, we demonstrate how to use such models for a real computer vision problem, i.e., video stabilization. The qualitative and quantitative experiments show that neural networks are able to learn robustness for general polynomial regression, with results that well overpass scores of traditional robust estimation methods.Comment: 18 pages, conferenc

    Video Frame Interpolation via Adaptive Separable Convolution

    Get PDF
    Standard video frame interpolation methods first estimate optical flow between input frames and then synthesize an intermediate frame guided by motion. Recent approaches merge these two steps into a single convolution process by convolving input frames with spatially adaptive kernels that account for motion and re-sampling simultaneously. These methods require large kernels to handle large motion, which limits the number of pixels whose kernels can be estimated at once due to the large memory demand. To address this problem, this paper formulates frame interpolation as local separable convolution over input frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D kernels require significantly fewer parameters to be estimated. Our method develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously. Since our method is able to estimate kernels and synthesizes the whole video frame at once, it allows for the incorporation of perceptual loss to train the neural network to produce visually pleasing frames. This deep neural network is trained end-to-end using widely available video data without any human annotation. Both qualitative and quantitative experiments show that our method provides a practical solution to high-quality video frame interpolation.Comment: ICCV 2017, http://graphics.cs.pdx.edu/project/sepconv

    Geodesic Distance Histogram Feature for Video Segmentation

    Full text link
    This paper proposes a geodesic-distance-based feature that encodes global information for improved video segmentation algorithms. The feature is a joint histogram of intensity and geodesic distances, where the geodesic distances are computed as the shortest paths between superpixels via their boundaries. We also incorporate adaptive voting weights and spatial pyramid configurations to include spatial information into the geodesic histogram feature and show that this further improves results. The feature is generic and can be used as part of various algorithms. In experiments, we test the geodesic histogram feature by incorporating it into two existing video segmentation frameworks. This leads to significantly better performance in 3D video segmentation benchmarks on two datasets

    Visual motion processing and human tracking behavior

    Full text link
    The accurate visual tracking of a moving object is a human fundamental skill that allows to reduce the relative slip and instability of the object's image on the retina, thus granting a stable, high-quality vision. In order to optimize tracking performance across time, a quick estimate of the object's global motion properties needs to be fed to the oculomotor system and dynamically updated. Concurrently, performance can be greatly improved in terms of latency and accuracy by taking into account predictive cues, especially under variable conditions of visibility and in presence of ambiguous retinal information. Here, we review several recent studies focusing on the integration of retinal and extra-retinal information for the control of human smooth pursuit.By dynamically probing the tracking performance with well established paradigms in the visual perception and oculomotor literature we provide the basis to test theoretical hypotheses within the framework of dynamic probabilistic inference. We will in particular present the applications of these results in light of state-of-the-art computer vision algorithms
    • …
    corecore