14 research outputs found

    V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints

    Full text link
    We introduce a learning-based depth map fusion framework that accepts a set of depth and confidence maps generated by a Multi-View Stereo (MVS) algorithm as input and improves them. This is accomplished by integrating volumetric visibility constraints that encode long-range surface relationships across different views into an end-to-end trainable architecture. We also introduce a depth search window estimation sub-network trained jointly with the larger fusion sub-network to reduce the depth hypothesis search space along each ray. Our method learns to model depth consensus and violations of visibility constraints directly from the data; effectively removing the necessity of fine-tuning fusion parameters. Extensive experiments on MVS datasets show substantial improvements in the accuracy of the output fused depth and confidence maps.Comment: ICCV 202

    Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching

    Get PDF
    International audienceWhile machine learning has been instrumental to the ongoing progress in most areas of computer vision, it has not been applied to the problem of stereo matching with similar frequency or success. We present a supervised learning approach for predicting the correctness of stereo matches based on a random forest and a set of features that capture various forms of information about each pixel.We show highly competitive results in predicting the correctness of matches and in confidence estimation, which allows us to rank pixels according to the reliability of their assigned disparities. Moreover, we show how these confidence values can be used to improve the accuracy of disparity maps by integrating them with an MRF-based stereo algorithm. This is an important distinction from current literature that has mainly focused on sparsification by removing potentially erroneous disparities to generate quasi-dense disparity maps

    Detecting and Parsing Architecture at City Scale from Range Data

    Get PDF
    We present a method for detecting and parsing buildings from unorganized 3D point clouds into a compact, hierarchical representation that is useful for high-level tasks. The input is a set of range measurements that cover large-scale urban environment. The desired output is a set of parse trees, such that each tree represents a semantic decomposition of a building – the nodes are roof surfaces as well as volumetric parts inferred from the observable surfaces. We model the above problem using a simple and generic grammar and use an efficient dependency parsing algorithm to generate the desired semantic description. We show how to learn the parameters of this simple grammar in order to produce correct parses of complex structures. We are able to apply our model on large point clouds and parse an entire city

    A Collaborative Visual Localization Scheme for a Low-Cost Heterogeneous Robotic Team with Non-Overlapping Perspectives

    Get PDF
    This paper presents and evaluates a relative localization scheme for a heterogeneous team of low-cost mobile robots. An error-state, complementary Kalman Filter was developed to fuse analytically-derived uncertainty of stereoscopic pose measurements of an aerial robot, made by a ground robot, with the inertial/visual proprioceptive measurements of both robots. Results show that the sources of error, image quantization, asynchronous sensors, and a non-stationary bias, were sufficiently modeled to estimate the pose of the aerial robot. In both simulation and experiments, we demonstrate the proposed methodology with a heterogeneous robot team, consisting of a UAV and a UGV tasked with collaboratively localizing themselves while avoiding obstacles in an unknown environment. The team is able to identify a goal location and obstacles in the environment and plan a path for the UGV to the goal location. The results demonstrate localization accuracies of 2cm to 4cm, on average, while the robots operate at a distance from each-other between 1m and 4m

    On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey

    Full text link
    Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial: "Learning-based depth estimation from stereo and monocular images: successes, limitations and future challenges" (https://sites.google.com/view/cvpr-2019-depth-from-image/home

    Underwater Exploration and Mapping

    Get PDF
    This paper analyzes the open challenges of exploring and mapping in the underwater realm with the goal of identifying research opportunities that will enable an Autonomous Underwater Vehicle (AUV) to robustly explore different environments. A taxonomy of environments based on their 3D structure is presented together with an analysis on how that influences the camera placement. The difference between exploration and coverage is presented and how they dictate different motion strategies. Loop closure, while critical for the accuracy of the resulting map, proves to be particularly challenging due to the limited field of view and the sensitivity to viewing direction. Experimental results of enforcing loop closures in underwater caves demonstrate a novel navigation strategy. Dense 3D mapping, both online and offline, as well as other sensor configurations are discussed following the presented taxonomy. Experimental results from field trials illustrate the above analysis.acceptedVersio
    corecore