990 research outputs found

    Large Scale SfM with the Distributed Camera Model

    Full text link
    We introduce the distributed camera model, a novel model for Structure-from-Motion (SfM). This model describes image observations in terms of light rays with ray origins and directions rather than pixels. As such, the proposed model is capable of describing a single camera or multiple cameras simultaneously as the collection of all light rays observed. We show how the distributed camera model is a generalization of the standard camera model and describe a general formulation and solution to the absolute camera pose problem that works for standard or distributed cameras. The proposed method computes a solution that is up to 8 times more efficient and robust to rotation singularities in comparison with gDLS. Finally, this method is used in an novel large-scale incremental SfM pipeline where distributed cameras are accurately and robustly merged together. This pipeline is a direct generalization of traditional incremental SfM; however, instead of incrementally adding one camera at a time to grow the reconstruction the reconstruction is grown by adding a distributed camera. Our pipeline produces highly accurate reconstructions efficiently by avoiding the need for many bundle adjustment iterations and is capable of computing a 3D model of Rome from over 15,000 images in just 22 minutes.Comment: Published at 2016 3DV Conferenc

    Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions

    Robust Rotation Synchronization via Low-rank and Sparse Matrix Decomposition

    Get PDF
    This paper deals with the rotation synchronization problem, which arises in global registration of 3D point-sets and in structure from motion. The problem is formulated in an unprecedented way as a "low-rank and sparse" matrix decomposition that handles both outliers and missing data. A minimization strategy, dubbed R-GoDec, is also proposed and evaluated experimentally against state-of-the-art algorithms on simulated and real data. The results show that R-GoDec is the fastest among the robust algorithms.Comment: The material contained in this paper is part of a manuscript submitted to CVI

    Camera Marker Networks for Pose Estimation and Scene Understanding in Construction Automation and Robotics.

    Full text link
    The construction industry faces challenges that include high workplace injuries and fatalities, stagnant productivity, and skill shortage. Automation and Robotics in Construction (ARC) has been proposed in the literature as a potential solution that makes machinery easier to collaborate with, facilitates better decision-making, or enables autonomous behavior. However, there are two primary technical challenges in ARC: 1) unstructured and featureless environments; and 2) differences between the as-designed and the as-built. It is therefore impossible to directly replicate conventional automation methods adopted in industries such as manufacturing on construction sites. In particular, two fundamental problems, pose estimation and scene understanding, must be addressed to realize the full potential of ARC. This dissertation proposes a pose estimation and scene understanding framework that addresses the identified research gaps by exploiting cameras, markers, and planar structures to mitigate the identified technical challenges. A fast plane extraction algorithm is developed for efficient modeling and understanding of built environments. A marker registration algorithm is designed for robust, accurate, cost-efficient, and rapidly reconfigurable pose estimation in unstructured and featureless environments. Camera marker networks are then established for unified and systematic design, estimation, and uncertainty analysis in larger scale applications. The proposed algorithms' efficiency has been validated through comprehensive experiments. Specifically, the speed, accuracy and robustness of the fast plane extraction and the marker registration have been demonstrated to be superior to existing state-of-the-art algorithms. These algorithms have also been implemented in two groups of ARC applications to demonstrate the proposed framework's effectiveness, wherein the applications themselves have significant social and economic value. The first group is related to in-situ robotic machinery, including an autonomous manipulator for assembling digital architecture designs on construction sites to help improve productivity and quality; and an intelligent guidance and monitoring system for articulated machinery such as excavators to help improve safety. The second group emphasizes human-machine interaction to make ARC more effective, including a mobile Building Information Modeling and way-finding platform with discrete location recognition to increase indoor facility management efficiency; and a 3D scanning and modeling solution for rapid and cost-efficient dimension checking and concise as-built modeling.PHDCivil EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113481/1/cforrest_1.pd

    EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization

    Full text link
    Visual localization is the task of estimating a 6-DoF camera pose of a query image within a provided 3D reference map. Thanks to recent advances in various 3D sensors, 3D point clouds are becoming a more accurate and affordable option for building the reference map, but research to match the points of 3D point clouds with pixels in 2D images for visual localization remains challenging. Existing approaches that jointly learn 2D-3D feature matching suffer from low inliers due to representational differences between the two modalities, and the methods that bypass this problem into classification have an issue of poor refinement. In this work, we propose EP2P-Loc, a novel large-scale visual localization method that mitigates such appearance discrepancy and enables end-to-end training for pose estimation. To increase the number of inliers, we propose a simple algorithm to remove invisible 3D points in the image, and find all 2D-3D correspondences without keypoint detection. To reduce memory usage and search complexity, we take a coarse-to-fine approach where we extract patch-level features from 2D images, then perform 2D patch classification on each 3D point, and obtain the exact corresponding 2D pixel coordinates through positional encoding. Finally, for the first time in this task, we employ a differentiable PnP for end-to-end training. In the experiments on newly curated large-scale indoor and outdoor benchmarks based on 2D-3D-S and KITTI, we show that our method achieves the state-of-the-art performance compared to existing visual localization and image-to-point cloud registration methods.Comment: Accepted to ICCV 202

    Revisiting Absolute Pose Regression

    Get PDF
    Images provide direct evidence for the position and orientation of the camera in space, known as camera pose. Traditionally, the problem of estimating the camera pose requires reference data for determining image correspondence and leveraging geometric relationships between features in the image. Recent advances in deep learning have led to a new class of methods that regress the pose directly from a single image. This thesis proposes methods for absolute camera pose regression. Absolute pose regression estimates the pose of a camera from a single image as the output of a fixed computation pipeline. These methods have many practical benefits over traditional methods, such as constant inference speed and simplicity of use. However, they also have severe limitations, the most significant of which are high pose error and the fact that a network must be trained for each new scene. Despite the negatives, absolute pose regression is an exciting line of research with many potential use cases. Our work focuses on three areas. First, we investigate the use of absolute pose regression across multiple scenes. We propose a method for using a mostly shared network to perform pose regression across multiple scenes without significant increase in pose error relative to per-scene networks. With this approach, we also show how the features learned during multi-scene training do not transfer to pose regression in new scenes. Next, we propose a new convolutional network to improve the accuracy of absolute pose regression. The new network takes inspiration from traditional methods to design a network explicitly for camera pose regression. As opposed to the black box approaches used by other methods, out method results in a significant decrease in pose error. Finally, we show an application of the new method to share network weights to estimate camera pose in multiple scenes. Due to the more explicit design of the network, it is naturally partitioned into scene-dependent and scene-agnostic layers, allowing us to transfer pretrained weights to novel scenes without needing to retrained the entire network. The contribution of this thesis is a novel architecture for absolute pose regression which directly uses well known geometric relations that results in higher pose accuracy and allows for localization within novel scenes without needing to retrain the full network
    corecore