37 research outputs found

    Hierarchical inference of disparity

    Get PDF
    Disparity selective cells in V1 respond to the correlated receptive fields of the left and right retinae, which do not necessarily correspond to the same object in the 3D scene, i.e., these cells respond equally to both false and correct stereo matches. On the other hand, neurons in the extrastriate visual area V2 show much stronger responses to correct visual matches [Bakin et al, 2000]. This indicates that a part of the stereo correspondence problem is solved during disparity processing in these two areas. However, the mechanisms employed by the brain to accomplish this task are not yet understood. Existing computational models are mostly based on cooperative computations in V1 [Marr and Poggio 1976, Read and Cumming 2007], without exploiting the potential benefits of the hierarchical structure between V1 and V2. Here we propose a two-layer graphical model for disparity estimation from stereo. The lower layer matches the linear responses of neurons with Gabor receptive fields across images. Nodes in the upper layer infer a sparse code of the disparity map and act as priors that help disambiguate false from correct matches. When learned on natural disparity maps, the receptive fields of the sparse code converge to oriented depth edges, which is consistent with the electrophysiological studies in macaque [von der Heydt et al, 2000]. Moreover, when such a code is used for depth inference in our two layer model, the resulting disparity map for the Tsukuba stereo pair [middlebury database] has 40% less false matches than the solution given by the first layer. Our model offers a demonstration of the hierarchical disparity computation, leading to testable predictions about V1-V2 interactions

    Learning sparse representations of depth

    Full text link
    This paper introduces a new method for learning and inferring sparse representations of depth (disparity) maps. The proposed algorithm relaxes the usual assumption of the stationary noise model in sparse coding. This enables learning from data corrupted with spatially varying noise or uncertainty, typically obtained by laser range scanners or structured light depth cameras. Sparse representations are learned from the Middlebury database disparity maps and then exploited in a two-layer graphical model for inferring depth from stereo, by including a sparsity prior on the learned features. Since they capture higher-order dependencies in the depth structure, these priors can complement smoothness priors commonly used in depth inference based on Markov Random Field (MRF) models. Inference on the proposed graph is achieved using an alternating iterative optimization technique, where the first layer is solved using an existing MRF-based stereo matching algorithm, then held fixed as the second layer is solved using the proposed non-stationary sparse coding algorithm. This leads to a general method for improving solutions of state of the art MRF-based depth estimation algorithms. Our experimental results first show that depth inference using learned representations leads to state of the art denoising of depth maps obtained from laser range scanners and a time of flight camera. Furthermore, we show that adding sparse priors improves the results of two depth estimation methods: the classical graph cut algorithm by Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page

    Nonparametric Least Squares Regression for Image Reconstruction on the Sphere

    Get PDF
    This paper addresses the problem of interpolating signals defined on a 2-d sphere from non-uniform samples. We present an interpolation method based on locally weighted linear and nonlinear regression, which takes into account the differences in importance of neighboring samples for signal reconstruction. We show that for optimal kernel function variance, the proposed method performs interpolation more accurately than the nearest neighbor method, especially in noisy conditions. Moreover, this method does not have memory limitations which set the upper bound on the possible interpolation points number

    Distributed multi-view image coding with learned dictionaries

    Get PDF
    This paper addresses the problem of distributed image coding in camera neworks. The correlation between multiple images of a scene captured from different viewpoints can be effiiciently modeled by local geometric transforms of prominent images features. Such features can be efficiently represented by sparse approximation algorithms using geometric dictionaries of various waveforms, called atoms. When the dictionaries are built on geometrical transformations of some generating functions, the features in different images can be paired with simple local geometrical transforms, such as scaling, rotation or translations. The construction of the dictionary however represents a trade-off between approximation performance that generally improves with the size of the dictionary, and cost for coding the atoms indexes. We propose a learning algorithm for the construction of dictionaries adapted to stereo omnidirectional images. The algorithm is based on a maximum likelihood solution that results in atoms adapted to both image approximation and stereo matching. We then use the learned dictionary in a Wyner-Ziv multi-view image coder built on a geometrical correlation model. The experimental results show that the learned dictionary improves the rate- distortion performance of the Wyner-Ziv coder at low bit rates compared to a baseline parametric dictionary

    Dictionary learning: What is the right representation for my signal?

    Get PDF

    Geometry-based distributed coding of multi-view omnidirectional images

    Get PDF
    This paper presents a distributed and occlusion-robust coding scheme for multi-view omnidirectional images, which relies on the geometry of the 3D scene. The Wyner-Ziv coder uses a multi-view correlation model that relates 3D features in different images using local geometric transforms in order to perform coset code design and the coset decoding of each feature. The meaningful image features are extracted by a sparse decomposition over a dictionary of localized geometric atoms. However, in such a decomposition, occlusions or low-correlated features appear as independent elements in the encoded stream, which can lead to erroneous reconstruction at the decoder. To ameliorate this problem, we propose to leave a controlled redundancy by sending additional syndrome bits that are computed by channel coding across the atoms of the Wyner-Ziv image. This offers resiliency against occlusions, or against inaccuracy in the view correlation model. The experimental results demonstrate the coding performance of the proposed scheme at low bit rate, where it performs close to the joint encoding strategy

    Sparse stereo image coding with learned dictionaries

    Get PDF
    This paper proposes a framework for stereo image coding with effective representation of geometry in 3D scenes. We propose a joint sparse approximation framework for pairs of perspective images that are represented as linear expansions of atoms selected from a dictionary of geometric functions learned on a database of stereo perspective images. We then present a coding solution where atoms are selected iteratively as a trade-off between distortion and consistency of the geometry information. Experimental results on stereo images from the Middlebury database show that the new coder achieves better rate-distortion performance compared to the MPEG4-part10 scheme, at all rates. In addition to good rate-distortion performance, our flexible framework permits to build consistent image representations that capture the geometry of the scene. It certainly represents a promising solution towards the design of multi-view coding algorithms where the compressed stream inherently contains rich information about 3D geometry

    3D Face Recognition using Sparse Spherical Representations

    Get PDF
    This paper addresses the problem of 3D face recognition using spherical sparse representations. We first propose a fully automated registration process that permits to align the 3D face point clouds. These point clouds are then represented as signals on the 2D sphere, in order to take benefit of the geometry information. Simultaneous sparse approximations implement a dimensionality reduction process by subspace projection. Each face is typically represented by a few spherical basis functions that are able to capture the salient facial characteristics. The dimensionality reduction step preserves the discriminant facial information and eventually permits an effective matching in the reduced space, where it can further be combined with LDA for improved recognition performance. We evaluate the 3D face recognition algorithm on the FRGC v.1.0 data set, where it outperforms classical state-of-the-art solutions based on PCA or LDA on depth face images