3,109 research outputs found

    Cluster-Wise Ratio Tests for Fast Camera Localization

    Full text link
    Feature point matching for camera localization suffers from scalability problems. Even when feature descriptors associated with 3D scene points are locally unique, as coverage grows, similar or repeated features become increasingly common. As a result, the standard distance ratio-test used to identify reliable image feature points is overly restrictive and rejects many good candidate matches. We propose a simple coarse-to-fine strategy that uses conservative approximations to robust local ratio-tests that can be computed efficiently using global approximate k-nearest neighbor search. We treat these forward matches as votes in camera pose space and use them to prioritize back-matching within candidate camera pose clusters, exploiting feature co-visibility captured by clustering the 3D model camera pose graph. This approach achieves state-of-the-art camera localization results on a variety of popular benchmarks, outperforming several methods that use more complicated data structures and that make more restrictive assumptions on camera pose. We also carry out diagnostic analyses on a difficult test dataset containing globally repetitive structure that suggest our approach successfully adapts to the challenges of large-scale image localization

    From small to large baseline multiview stereo : dealing with blur, clutter and occlusions

    Get PDF
    This thesis addresses the problem of reconstructing the three-dimensional (3D) digital model of a scene from a collection of two-dimensional (2D) images taken from it. To address this fundamental computer vision problem, we propose three algorithms. They are the main contributions of this thesis. First, we solve multiview stereo with the o -axis aperture camera. This system has a very small baseline as images are captured from viewpoints close to each other. The key idea is to change the size or the 3D location of the aperture of the camera so as to extract selected portions of the scene. Our imaging model takes both defocus and stereo information into account and allows to solve shape reconstruction and image restoration in one go. The o -axis aperture camera can be used in a small-scale space where the camera motion is constrained by the surrounding environment, such as in 3D endoscopy. Second, to solve multiview stereo with large baseline, we present a framework that poses the problem of recovering a 3D surface in the scene as a regularized minimal partition problem of a visibility function. The formulation is convex and hence guarantees that the solution converges to the global minimum. Our formulation is robust to view-varying extensive occlusions, clutter and image noise. At any stage during the estimation process the method does not rely on the visual hull, 2D silhouettes, approximate depth maps, or knowing which views are dependent(i.e., overlapping) and which are independent( i.e., non overlapping). Furthermore, the degenerate solution, the null surface, is not included as a global solution in this formulation. One limitation of this algorithm is that its computation complexity grows with the number of views that we combine simultaneously. To address this limitation, we propose a third formulation. In this formulation, the visibility functions are integrated within a narrow band around the estimated surface by setting weights to each point along optical rays. This thesis presents technical descriptions for each algorithm and detailed analyses to show how these algorithms improve existing reconstruction techniques

    Hybrid Scene Compression for Visual Localization

    Full text link
    Localizing an image wrt. a 3D scene model represents a core task for many computer vision applications. An increasing number of real-world applications of visual localization on mobile devices, e.g., Augmented Reality or autonomous robots such as drones or self-driving cars, demand localization approaches to minimize storage and bandwidth requirements. Compressing the 3D models used for localization thus becomes a practical necessity. In this work, we introduce a new hybrid compression algorithm that uses a given memory limit in a more effective way. Rather than treating all 3D points equally, it represents a small set of points with full appearance information and an additional, larger set of points with compressed information. This enables our approach to obtain a more complete scene representation without increasing the memory requirements, leading to a superior performance compared to previous compression schemes. As part of our contribution, we show how to handle ambiguous matches arising from point compression during RANSAC. Besides outperforming previous compression techniques in terms of pose accuracy under the same memory constraints, our compression scheme itself is also more efficient. Furthermore, the localization rates and accuracy obtained with our approach are comparable to state-of-the-art feature-based methods, while using a small fraction of the memory.Comment: Published at CVPR 201

    Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

    Full text link
    We study 3D shape modeling from a single image and make contributions to it in three aspects. First, we present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc. Building such a large-scale dataset, however, is highly challenging; existing datasets either contain only synthetic data, or lack precise alignment between 2D images and 3D shapes, or only have a small number of images. Second, we calibrate the evaluation criteria for 3D shape reconstruction through behavioral studies, and use them to objectively and systematically benchmark cutting-edge reconstruction algorithms on Pix3D. Third, we design a novel model that simultaneously performs 3D reconstruction and pose estimation; our multi-task learning approach achieves state-of-the-art performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work. Project page: http://pix3d.csail.mit.ed

    Multidimensional Optical Sensing and Imaging Systems (MOSIS): From Macro to Micro Scales

    Get PDF
    Multidimensional optical imaging systems for information processing and visualization technologies have numerous applications in fields such as manufacturing, medical sciences, entertainment, robotics, surveillance, and defense. Among different three-dimensional (3-D) imaging methods, integral imaging is a promising multiperspective sensing and display technique. Compared with other 3-D imaging techniques, integral imaging can capture a scene using an incoherent light source and generate real 3-D images for observation without any special viewing devices. This review paper describes passive multidimensional imaging systems combined with different integral imaging configurations. One example is the integral-imaging-based multidimensional optical sensing and imaging systems (MOSIS), which can be used for 3-D visualization, seeing through obscurations, material inspection, and object recognition from microscales to long range imaging. This system utilizes many degrees of freedom such as time and space multiplexing, depth information, polarimetric, temporal, photon flux and multispectral information based on integral imaging to record and reconstruct the multidimensionally integrated scene. Image fusion may be used to integrate the multidimensional images obtained by polarimetric sensors, multispectral cameras, and various multiplexing techniques. The multidimensional images contain substantially more information compared with two-dimensional (2-D) images or conventional 3-D images. In addition, we present recent progress and applications of 3-D integral imaging including human gesture recognition in the time domain, depth estimation, mid-wave-infrared photon counting, 3-D polarimetric imaging for object shape and material identification, dynamic integral imaging implemented with liquid-crystal devices, and 3-D endoscopy for healthcare applications.B. Javidi wishes to acknowledge support by the National Science Foundation (NSF) under Grant NSF/IIS-1422179, and DARPA and US Army under contract number W911NF-13-1-0485. The work of P. Latorre Carmona, A. Martínez-Uso, J. M. Sotoca and F. Pla was supported by the Spanish Ministry of Economy under the project ESP2013-48458-C4-3-P, and by MICINN under the project MTM2013-48371-C2-2-PDGI, by Generalitat Valenciana under the project PROMETEO-II/2014/062, and by Universitat Jaume I through project P11B2014-09. The work of M. Martínez-Corral and G. Saavedra was supported by the Spanish Ministry of Economy and Competitiveness under the grant DPI2015-66458-C2-1R, and by the Generalitat Valenciana, Spain under the project PROMETEOII/2014/072

    Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization

    Get PDF
    Image-based camera relocalization is an important problem in computer vision and robotics. Recent works utilize convolutional neural networks (CNNs) to regress for pixels in a query image their corresponding 3D world coordinates in the scene. The final pose is then solved via a RANSAC-based optimization scheme using the predicted coordinates. Usually, the CNN is trained with ground truth scene coordinates, but it has also been shown that the network can discover 3D scene geometry automatically by minimizing single-view reprojection loss. However, due to the deficiencies of the reprojection loss, the network needs to be carefully initialized. In this paper, we present a new angle-based reprojection loss, which resolves the issues of the original reprojection loss. With this new loss function, the network can be trained without careful initialization, and the system achieves more accurate results. The new loss also enables us to utilize available multi-view constraints, which further improve performance.Comment: ECCV 2018 Workshop (Geometry Meets Deep Learning

    Multi-Scale 3D Scene Flow from Binocular Stereo Sequences

    Full text link
    Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108
    corecore