4,900 research outputs found

    Matching neural paths: transfer from recognition to correspondence search

    Full text link
    Many machine learning tasks require finding per-part correspondences between objects. In this work we focus on low-level correspondences - a highly ambiguous matching problem. We propose to use a hierarchical semantic representation of the objects, coming from a convolutional neural network, to solve this ambiguity. Training it for low-level correspondence prediction directly might not be an option in some domains where the ground-truth correspondences are hard to obtain. We show how transfer from recognition can be used to avoid such training. Our idea is to mark parts as "matching" if their features are close to each other at all the levels of convolutional feature hierarchy (neural paths). Although the overall number of such paths is exponential in the number of layers, we propose a polynomial algorithm for aggregating all of them in a single backward pass. The empirical validation is done on the task of stereo correspondence and demonstrates that we achieve competitive results among the methods which do not use labeled target domain data.Comment: Accepted at NIPS 201

    Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices

    Full text link
    For many applications in low-power real-time robotics, stereo cameras are the sensors of choice for depth perception as they are typically cheaper and more versatile than their active counterparts. Their biggest drawback, however, is that they do not directly sense depth maps; instead, these must be estimated through data-intensive processes. Therefore, appropriate algorithm selection plays an important role in achieving the desired performance characteristics. Motivated by applications in space and mobile robotics, we implement and evaluate a FPGA-accelerated adaptation of the ELAS algorithm. Despite offering one of the best trade-offs between efficiency and accuracy, ELAS has only been shown to run at 1.5-3 fps on a high-end CPU. Our system preserves all intriguing properties of the original algorithm, such as the slanted plane priors, but can achieve a frame rate of 47fps whilst consuming under 4W of power. Unlike previous FPGA based designs, we take advantage of both components on the CPU/FPGA System-on-Chip to showcase the strategy necessary to accelerate more complex and computationally diverse algorithms for such low power, real-time systems.Comment: 8 pages, 7 figures, 2 table

    Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs

    Full text link
    Human visual system relies on both binocular stereo cues and monocular focusness cues to gain effective 3D perception. In computer vision, the two problems are traditionally solved in separate tracks. In this paper, we present a unified learning-based technique that simultaneously uses both types of cues for depth inference. Specifically, we use a pair of focal stacks as input to emulate human perception. We first construct a comprehensive focal stack training dataset synthesized by depth-guided light field rendering. We then construct three individual networks: a Focus-Net to extract depth from a single focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from the focal stack, and a Stereo-Net to conduct stereo matching. We show how to integrate them into a unified BDfF-Net to obtain high-quality depth maps. Comprehensive experiments show that our approach outperforms the state-of-the-art in both accuracy and speed and effectively emulates human vision systems

    Analysis and approximation of some Shape-from-Shading models for non-Lambertian surfaces

    Full text link
    The reconstruction of a 3D object or a scene is a classical inverse problem in Computer Vision. In the case of a single image this is called the Shape-from-Shading (SfS) problem and it is known to be ill-posed even in a simplified version like the vertical light source case. A huge number of works deals with the orthographic SfS problem based on the Lambertian reflectance model, the most common and simplest model which leads to an eikonal type equation when the light source is on the vertical axis. In this paper we want to study non-Lambertian models since they are more realistic and suitable whenever one has to deal with different kind of surfaces, rough or specular. We will present a unified mathematical formulation of some popular orthographic non-Lambertian models, considering vertical and oblique light directions as well as different viewer positions. These models lead to more complex stationary nonlinear partial differential equations of Hamilton-Jacobi type which can be regarded as the generalization of the classical eikonal equation corresponding to the Lambertian case. However, all the equations corresponding to the models considered here (Oren-Nayar and Phong) have a similar structure so we can look for weak solutions to this class in the viscosity solution framework. Via this unified approach, we are able to develop a semi-Lagrangian approximation scheme for the Oren-Nayar and the Phong model and to prove a general convergence result. Numerical simulations on synthetic and real images will illustrate the effectiveness of this approach and the main features of the scheme, also comparing the results with previous results in the literature.Comment: Accepted version to Journal of Mathematical Imaging and Vision, 57 page

    Semantic Visual Localization

    Full text link
    Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, e.g., in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes
    • …
    corecore