Search CORE

10 research outputs found

On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

Author: Armagan Anil
Brasch Nikolas
Busam Benjamin
Ilic Slobodan
Jung HyunJun
Leonardis Ales
Li Yitong
Navab Nassir
Ruhkamp Patrick
Song Jifei
Verdie Yannick
Zhai Guangyao
Zhou Yiren
Publication venue
Publication date: 26/03/2023
Field of study

Learning-based methods to solve dense 3D vision problems typically train on 3D sensor data. The respectively used principle of measuring distances provides advantages and drawbacks. These are typically not compared nor discussed in the literature due to a lack of multi-modal datasets. Texture-less regions are problematic for structure from motion and stereo, reflective material poses issues for active sensing, and distances for translucent objects are intricate to measure with existing hardware. Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities. These effects remain unnoticed if the sensor measurement is considered as ground truth during the evaluation. This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction. We rigorously show the significant impact of sensor characteristics on the learned predictions and notice generalisation issues arising from various technologies in everyday household environments. For evaluation, we introduce a carefully designed dataset\footnote{dataset available at https://github.com/Junggy/HAMMER-dataset} comprising measurements from commodity sensors, namely D-ToF, I-ToF, passive/active stereo, and monocular RGB+P. Our study quantifies the considerable sensor noise impact and paves the way to improved dense vision estimates and targeted data fusion.Comment: Accepted at CVPR 2023, Main Paper + Supp. Mat. arXiv admin note: substantial text overlap with arXiv:2205.0456

arXiv.org e-Print Archive

University of Birmingham Research Portal

Jigsaw: Learning to Assemble Multiple Fractured Objects

Author: Huang Qixing
Lu Jiaxin
Sun Yifan
Publication venue
Publication date: 29/05/2023
Field of study

Automated assembly of 3D fractures is essential in orthopedics, archaeology, and our daily life. This paper presents Jigsaw, a novel framework for assembling physically broken 3D objects from multiple pieces. Our approach leverages hierarchical features of global and local geometry to match and align the fracture surfaces. Our framework consists of three components: (1) surface segmentation to separate fracture and original parts, (2) multi-parts matching to find correspondences among fracture surface points, and (3) robust global alignment to recover the global poses of the pieces. We show how to jointly learn segmentation and matching and seamlessly integrate feature matching and rigidity constraints. We evaluate Jigsaw on the Breaking Bad dataset and achieve superior performance compared to state-of-the-art methods. Our method also generalizes well to diverse fracture modes, objects, and unseen instances. To the best of our knowledge, this is the first learning-based method designed specifically for 3D fracture assembly over multiple pieces.Comment: 17 pages, 9 figure

arXiv.org e-Print Archive

ObjectMatch: Robust Registration using Canonical Object Correspondences

Author: Dai Angela
Gümeli Can
Nießner Matthias
Publication venue
Publication date: 24/03/2023
Field of study

We present ObjectMatch, a semantic and object-centric camera pose estimator for RGB-D SLAM pipelines. Modern camera pose estimators rely on direct correspondences of overlapping regions between frames; however, they cannot align camera frames with little or no overlap. In this work, we propose to leverage indirect correspondences obtained via semantic object identification. For instance, when an object is seen from the front in one frame and from the back in another frame, we can provide additional pose constraints through canonical object correspondences. We first propose a neural network to predict such correspondences on a per-pixel level, which we then combine in our energy formulation with state-of-the-art keypoint matching solved with a joint Gauss-Newton optimization. In a pairwise setting, our method improves registration recall of state-of-the-art feature matching, including from 24% to 45% in pairs with 10% or less inter-frame overlap. In registering RGB-D sequences, our method outperforms cutting-edge SLAM baselines in challenging, low-frame-rate scenarios, achieving more than 35% reduction in trajectory error in multiple scenes.Comment: Project Page: http://cangumeli.github.io/ObjectMatch Video: https://www.youtube.com/watch?v=kuXoKVrzUR

arXiv.org e-Print Archive

Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge

Author: Rodriguez Garcia Alberto
Song Shuran
Suo Daniel
Walker Ed
Xiao Jianxiong
Yu Kuan-Ting
Zeng Andy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/05/2017
Field of study

Robot warehouse automation has attracted significant interest in recent years, perhaps most visibly in the Amazon Picking Challenge (APC) [1]. A fully autonomous warehouse pick-and-place system requires robust vision that reliably recognizes and locates objects amid cluttered environments, self-occlusions, sensor noise, and a large variety of objects. In this paper we present an approach that leverages multiview RGB-D data and self-supervised, data-driven learning to overcome those difficulties. The approach was part of the MIT-Princeton Team system that took 3rd- and 4th-place in the stowing and picking tasks, respectively at APC 2016. In the proposed approach, we segment and label multiple views of a scene with a fully convolutional neural network, and then fit pre-scanned 3D object models to the resulting segmentation to get the 6D object pose. Training a deep neural network for segmentation typically requires a large amount of training data. We propose a self-supervised method to generate a large labeled dataset without tedious manual segmentation. We demonstrate that our system can reliably estimate the 6D pose of objects under a variety of scenarios. All code, data, and benchmarks are available at http://apc.cs.princeton.edu

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

Author: Armeni Iro
Barath Daniel
Jin Shengze
Pollefeys Marc
Publication venue
Publication date: 27/09/2023
Field of study

Point cloud registration has seen recent success with several learning-based methods that focus on correspondence matching and, as such, optimize only for this objective. Following the learning step of correspondence matching, they evaluate the estimated rigid transformation with a RANSAC-like framework. While it is an indispensable component of these methods, it prevents a fully end-to-end training, leaving the objective to minimize the pose error nonserved. We present a novel solution, Q-REG, which utilizes rich geometric information to estimate the rigid pose from a single correspondence. Q-REG allows to formalize the robust estimation as an exhaustive search, hence enabling end-to-end training that optimizes over both objectives of correspondence matching and rigid pose estimation. We demonstrate in the experiments that Q-REG is agnostic to the correspondence matching method and provides consistent improvement both when used only in inference and in end-to-end training. It sets a new state-of-the-art on the 3DMatch, KITTI, and ModelNet benchmarks

arXiv.org e-Print Archive

High-fidelity Human Body Modelling from User-generated Data

Author: Xu Z
Publication venue: 'Queen Mary University of London'
Publication date: 11/01/2019
Field of study

PhD thesisBuilding high-fidelity human body models for real people benefits a variety of applications, like fashion, health, entertainment, education and ergonomics applications. The goal of this thesis is to build visually plausible human body models from two kinds of user-generated data: low-quality point clouds and low-resolution 2D images. Due to the advances in 3D scanning technology and the growing availability of cost-effective 3D scanners to general users, a full human body scan can be easily acquired within two minutes. However, due to the imperfections of scanning devices, occlusion, self-occlusion and untrained scanning operation, the acquired scans tend to be full of noise, holes (missing data), outliers and distorted parts. In this thesis, the establishment of shape correspondences for human body meshes is firstly investigated. A robust and shape-aware approach is proposed to detect accurate shape correspondences for closed human body meshes. By investigating the vertex movements of 200 human body meshes, a robust non-rigid mesh registration method is proposed which combines the human body shape model with the traditional nonrigid ICP. To facilitate the development and benchmarking of registration methods on Kinect Fusion data, a dataset of user-generated scansis built, named Kinect-based 3D Human Body (K3D-hub) Dataset, with one Microsoft Kinect for XBOX 360. Besides building 3D human body models from point clouds, the problem is also tackled which estimates accurate 3D human body models from single 2D images. A state-of-the-art parametric 3D human body model SMPL is fitted to 2D joints as well as the boundary of the human body. Fast Region based CNN and deep CNN based methods are adopted to detect the 2D joints and boundary for each human body image automatically. Considering the commonly encountered scenario where people are in stable poses at most of the time, a stable pose prior is introduced from CMU motion capture (mocap) dataset for further improving the accuracy of pose estimation

Queen Mary Research Online

Registration of non-rigidly deforming objects

Author: Dyke Roberto M.
Publication venue
Publication date
Field of study

This thesis investigates the current state-of-the-art in registration of non-rigidly deforming shapes. In particular, the problem of non-isometry is considered. First, a method to address locally anisotropic deformation is proposed. The subsequent evaluation of this method highlights a lack of resources for evaluating such methods. Three novel registration/shape correspondence benchmark datasets are developed for assessing different aspects of non-rigid deformation. Deficiencies in current evaluative measures are identified, leading to the development of a new performance measure that effectively communicates the density and distribution of correspondences. Finally, the problem of transferring skull orbit labels between scans is examined on a database of unlabelled skulls. A novel pipeline that mitigates errors caused by coarse representations is proposed

Online Research @ Cardiff

Towards Quantitative Endoscopy with Vision Intelligence

Author: Liu Xingtong
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 24/02/2022
Field of study

In this thesis, we work on topics related to quantitative endoscopy with vision-based intelligence. Specifically, our works revolve around the topic of video reconstruction in endoscopy, where many challenges exist, such as texture scarceness, illumination variation, multimodality, etc., and these prevent prior works from working effectively and robustly. To this end, we propose to combine the strength of expressivity of deep learning approaches and the rigorousness and accuracy of non-linear optimization algorithms to develop a series of methods to confront such challenges towards quantitative endoscopy. We first propose a retrospective sparse reconstruction method that can estimate a high-accuracy and density point cloud and high-completeness camera trajectory from a monocular endoscopic video with state-of-the-art performance. To enable this, replacing the role of a hand-crafted local descriptor, a deep image feature descriptor is developed to boost the feature matching performance in a typical sparse reconstruction algorithm. A retrospective surface reconstruction pipeline is then proposed to estimate a textured surface model from a monocular endoscopic video, where self-supervised depth and descriptor learning and surface fusion technique is involved. We show that the proposed method performs superior to a popular dense reconstruction method and the estimate reconstructions are in good agreement with the surface models obtained from CT scans. To align video-reconstructed surface models with pre-operative imaging such as CT, we introduce a global point cloud registration algorithm that is robust to resolution mismatch that often happens in such multi-modal scenarios. Specifically, a geometric feature descriptor is developed where a novel network normalization technique is used to help a 3D network produce more consistent and distinctive geometric features for samples with different resolutions. The proposed geometric descriptor achieves state-of-the-art performance, based on our evaluation. Last but not least, a real-time SLAM system that estimates a surface geometry and camera trajectory from a monocular endoscopic video is developed, where deep representations for geometry and appearance and non-linear factor graph optimization are used. We show that the proposed SLAM system performs favorably compared with a state-of-the-art feature-based SLAM system

JScholarship

Recommended from our members

Understanding and Facilitating Human-AI Teaming for Real-World Computer Vision Tasks

Author: Xu Chengyuan
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

Recent machine learning research has demonstrated that many task-specific AI models now reach or surpass human performance on static benchmarks. However, in real-world applications where human users collaborate with, or rely on AIs, key questions remain: Do these advancements in AI models inherently improve the user experience or augment users' capabilities? When and how should we partner users with AI to form effective human-AI teams? This dissertation explores new forms of human-AI collaboration in the context of real-world computer vision tasks. We demonstrate different user roles in diverse AI-assisted workflows -- from passive recipients of AI model outputs to active participants who steer the shaping of the model. 1) We developed intuitive user interfaces to make deep learning accessible to end users, in this case astrophysicists, without requiring knowledge in machine learning. The end-to-end model enhances the accuracy of automated processing of daily space observations from 20+ telescopes globally. The streamlined interface injects confidence into researchers' AI-supported analysis of scientific imagery. 2) We proposed the concept of "restrained and zealous AIs" to harness the complementary strength in human-AI teams. Insights from a month-long user study involving 78 professional data annotators suggest that recommendations from ill-suited AI counterparts may detrimentally affect users' skills. 3) Finally, we brought a novel concept of "in-situ learning" to augmented reality, where the user interacts with physical objects to train spatially-aware AI models that can remember the personalized environment and objects for various tasks. Each project brings the end user to a more active and engaged role in the inference, training, and evaluation processes of human-in-the-loop machine learning. In summary, this dissertation provides insights into good practices for teaming humans with AI for real-world collaboration, informing the design of future AI-assisted systems

eScholarship - University of California