24,642 research outputs found

    2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth Estimation

    Full text link
    Stereo correspondence matching is an essential part of the multi-step stereo depth estimation process. This paper revisits the depth estimation problem, avoiding the explicit stereo matching step using a simple two-tower convolutional neural network. The proposed algorithm is entitled as 2T-UNet. The idea behind 2T-UNet is to replace cost volume construction with twin convolution towers. These towers have an allowance for different weights between them. Additionally, the input for twin encoders in 2T-UNet are different compared to the existing stereo methods. Generally, a stereo network takes a right and left image pair as input to determine the scene geometry. However, in the 2T-UNet model, the right stereo image is taken as one input and the left stereo image along with its monocular depth clue information, is taken as the other input. Depth clues provide complementary suggestions that help enhance the quality of predicted scene geometry. The 2T-UNet surpasses state-of-the-art monocular and stereo depth estimation methods on the challenging Scene flow dataset, both quantitatively and qualitatively. The architecture performs incredibly well on complex natural scenes, highlighting its usefulness for various real-time applications. Pretrained weights and code will be made readily available

    FPGA synthesis of an stereo image matching architecture for autonomous mobile robots

    Get PDF
    This paper describes a hardware proposal to speed up the process of image matching in stereo vision systems like those employed by autonomous mobile robots. This proposal combines a classical window-based matching approach with a previous stage, where key points are selected from each image of the stereo pair. In this first step the key point extraction method is based on the SIFT algorithm. Thus, in the second step, the window-based matching is only applied to the set of selected key points, instead of to the whole images. For images with a 1% of key points, this method speeds up the matching four orders of magnitude. This proposal is, on the one hand, a better parallelizable architecture than the original SIFT, and on the other, a faster technique than a full image windows matching approach. The architecture has been implemented on a lower power Virtex 6 FPGA and it achieves a image matching speed above 30 fps.This work has been funded by Spanish government project TEC2015-66878-C3-2-R (MINECO/FEDER, UE)

    Selection and Recognition of Landmarks Using Terrain Spatiograms

    Get PDF
    A team of robots working to explore and map an area may need to share information about landmarks so as to register their local maps and to plan effective exploration strategies. In previous papers we have introduced a combined image and spatial representation for landmarks: terrain spatiograms. We have shown that for manually selected views, terrain spatiograms provide an effective, shared representation that allows for occlusion filtering and a combination of multiple views. In this paper, we present a landmark saliency architecture (LSA) for automatically selecting candidate landmarks. Using a dataset of 21 outdoor stereo images generated by LSA, we show that the terrain spatiogram representation reliably recognizes automatically selected landmarks. The terrain spatiogram results are shown to improve on two purely appearance based approaches: template matching and image histogram matching
    corecore