10 research outputs found

    On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

    Get PDF
    Learning-based methods to solve dense 3D vision problems typically train on 3D sensor data. The respectively used principle of measuring distances provides advantages and drawbacks. These are typically not compared nor discussed in the literature due to a lack of multi-modal datasets. Texture-less regions are problematic for structure from motion and stereo, reflective material poses issues for active sensing, and distances for translucent objects are intricate to measure with existing hardware. Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities. These effects remain unnoticed if the sensor measurement is considered as ground truth during the evaluation. This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction. We rigorously show the significant impact of sensor characteristics on the learned predictions and notice generalisation issues arising from various technologies in everyday household environments. For evaluation, we introduce a carefully designed dataset\footnote{dataset available at https://github.com/Junggy/HAMMER-dataset} comprising measurements from commodity sensors, namely D-ToF, I-ToF, passive/active stereo, and monocular RGB+P. Our study quantifies the considerable sensor noise impact and paves the way to improved dense vision estimates and targeted data fusion.Comment: Accepted at CVPR 2023, Main Paper + Supp. Mat. arXiv admin note: substantial text overlap with arXiv:2205.0456

    Jigsaw: Learning to Assemble Multiple Fractured Objects

    Full text link
    Automated assembly of 3D fractures is essential in orthopedics, archaeology, and our daily life. This paper presents Jigsaw, a novel framework for assembling physically broken 3D objects from multiple pieces. Our approach leverages hierarchical features of global and local geometry to match and align the fracture surfaces. Our framework consists of three components: (1) surface segmentation to separate fracture and original parts, (2) multi-parts matching to find correspondences among fracture surface points, and (3) robust global alignment to recover the global poses of the pieces. We show how to jointly learn segmentation and matching and seamlessly integrate feature matching and rigidity constraints. We evaluate Jigsaw on the Breaking Bad dataset and achieve superior performance compared to state-of-the-art methods. Our method also generalizes well to diverse fracture modes, objects, and unseen instances. To the best of our knowledge, this is the first learning-based method designed specifically for 3D fracture assembly over multiple pieces.Comment: 17 pages, 9 figure

    ObjectMatch: Robust Registration using Canonical Object Correspondences

    Full text link
    We present ObjectMatch, a semantic and object-centric camera pose estimator for RGB-D SLAM pipelines. Modern camera pose estimators rely on direct correspondences of overlapping regions between frames; however, they cannot align camera frames with little or no overlap. In this work, we propose to leverage indirect correspondences obtained via semantic object identification. For instance, when an object is seen from the front in one frame and from the back in another frame, we can provide additional pose constraints through canonical object correspondences. We first propose a neural network to predict such correspondences on a per-pixel level, which we then combine in our energy formulation with state-of-the-art keypoint matching solved with a joint Gauss-Newton optimization. In a pairwise setting, our method improves registration recall of state-of-the-art feature matching, including from 24% to 45% in pairs with 10% or less inter-frame overlap. In registering RGB-D sequences, our method outperforms cutting-edge SLAM baselines in challenging, low-frame-rate scenarios, achieving more than 35% reduction in trajectory error in multiple scenes.Comment: Project Page: http://cangumeli.github.io/ObjectMatch Video: https://www.youtube.com/watch?v=kuXoKVrzUR

    Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge

    Get PDF
    Robot warehouse automation has attracted significant interest in recent years, perhaps most visibly in the Amazon Picking Challenge (APC) [1]. A fully autonomous warehouse pick-and-place system requires robust vision that reliably recognizes and locates objects amid cluttered environments, self-occlusions, sensor noise, and a large variety of objects. In this paper we present an approach that leverages multiview RGB-D data and self-supervised, data-driven learning to overcome those difficulties. The approach was part of the MIT-Princeton Team system that took 3rd- and 4th-place in the stowing and picking tasks, respectively at APC 2016. In the proposed approach, we segment and label multiple views of a scene with a fully convolutional neural network, and then fit pre-scanned 3D object models to the resulting segmentation to get the 6D object pose. Training a deep neural network for segmentation typically requires a large amount of training data. We propose a self-supervised method to generate a large labeled dataset without tedious manual segmentation. We demonstrate that our system can reliably estimate the 6D pose of objects under a variety of scenarios. All code, data, and benchmarks are available at http://apc.cs.princeton.edu

    Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

    Full text link
    Point cloud registration has seen recent success with several learning-based methods that focus on correspondence matching and, as such, optimize only for this objective. Following the learning step of correspondence matching, they evaluate the estimated rigid transformation with a RANSAC-like framework. While it is an indispensable component of these methods, it prevents a fully end-to-end training, leaving the objective to minimize the pose error nonserved. We present a novel solution, Q-REG, which utilizes rich geometric information to estimate the rigid pose from a single correspondence. Q-REG allows to formalize the robust estimation as an exhaustive search, hence enabling end-to-end training that optimizes over both objectives of correspondence matching and rigid pose estimation. We demonstrate in the experiments that Q-REG is agnostic to the correspondence matching method and provides consistent improvement both when used only in inference and in end-to-end training. It sets a new state-of-the-art on the 3DMatch, KITTI, and ModelNet benchmarks

    High-fidelity Human Body Modelling from User-generated Data

    Get PDF
    PhD thesisBuilding high-fidelity human body models for real people benefits a variety of applications, like fashion, health, entertainment, education and ergonomics applications. The goal of this thesis is to build visually plausible human body models from two kinds of user-generated data: low-quality point clouds and low-resolution 2D images. Due to the advances in 3D scanning technology and the growing availability of cost-effective 3D scanners to general users, a full human body scan can be easily acquired within two minutes. However, due to the imperfections of scanning devices, occlusion, self-occlusion and untrained scanning operation, the acquired scans tend to be full of noise, holes (missing data), outliers and distorted parts. In this thesis, the establishment of shape correspondences for human body meshes is firstly investigated. A robust and shape-aware approach is proposed to detect accurate shape correspondences for closed human body meshes. By investigating the vertex movements of 200 human body meshes, a robust non-rigid mesh registration method is proposed which combines the human body shape model with the traditional nonrigid ICP. To facilitate the development and benchmarking of registration methods on Kinect Fusion data, a dataset of user-generated scansis built, named Kinect-based 3D Human Body (K3D-hub) Dataset, with one Microsoft Kinect for XBOX 360. Besides building 3D human body models from point clouds, the problem is also tackled which estimates accurate 3D human body models from single 2D images. A state-of-the-art parametric 3D human body model SMPL is fitted to 2D joints as well as the boundary of the human body. Fast Region based CNN and deep CNN based methods are adopted to detect the 2D joints and boundary for each human body image automatically. Considering the commonly encountered scenario where people are in stable poses at most of the time, a stable pose prior is introduced from CMU motion capture (mocap) dataset for further improving the accuracy of pose estimation

    Registration of non-rigidly deforming objects

    Get PDF
    This thesis investigates the current state-of-the-art in registration of non-rigidly deforming shapes. In particular, the problem of non-isometry is considered. First, a method to address locally anisotropic deformation is proposed. The subsequent evaluation of this method highlights a lack of resources for evaluating such methods. Three novel registration/shape correspondence benchmark datasets are developed for assessing different aspects of non-rigid deformation. Deficiencies in current evaluative measures are identified, leading to the development of a new performance measure that effectively communicates the density and distribution of correspondences. Finally, the problem of transferring skull orbit labels between scans is examined on a database of unlabelled skulls. A novel pipeline that mitigates errors caused by coarse representations is proposed

    Towards Quantitative Endoscopy with Vision Intelligence

    Get PDF
    In this thesis, we work on topics related to quantitative endoscopy with vision-based intelligence. Specifically, our works revolve around the topic of video reconstruction in endoscopy, where many challenges exist, such as texture scarceness, illumination variation, multimodality, etc., and these prevent prior works from working effectively and robustly. To this end, we propose to combine the strength of expressivity of deep learning approaches and the rigorousness and accuracy of non-linear optimization algorithms to develop a series of methods to confront such challenges towards quantitative endoscopy. We first propose a retrospective sparse reconstruction method that can estimate a high-accuracy and density point cloud and high-completeness camera trajectory from a monocular endoscopic video with state-of-the-art performance. To enable this, replacing the role of a hand-crafted local descriptor, a deep image feature descriptor is developed to boost the feature matching performance in a typical sparse reconstruction algorithm. A retrospective surface reconstruction pipeline is then proposed to estimate a textured surface model from a monocular endoscopic video, where self-supervised depth and descriptor learning and surface fusion technique is involved. We show that the proposed method performs superior to a popular dense reconstruction method and the estimate reconstructions are in good agreement with the surface models obtained from CT scans. To align video-reconstructed surface models with pre-operative imaging such as CT, we introduce a global point cloud registration algorithm that is robust to resolution mismatch that often happens in such multi-modal scenarios. Specifically, a geometric feature descriptor is developed where a novel network normalization technique is used to help a 3D network produce more consistent and distinctive geometric features for samples with different resolutions. The proposed geometric descriptor achieves state-of-the-art performance, based on our evaluation. Last but not least, a real-time SLAM system that estimates a surface geometry and camera trajectory from a monocular endoscopic video is developed, where deep representations for geometry and appearance and non-linear factor graph optimization are used. We show that the proposed SLAM system performs favorably compared with a state-of-the-art feature-based SLAM system
    corecore