125 research outputs found
L6DNet: Light 6 DoF Network for Robust and Precise Object Pose Estimation with Small Datasets
Estimating the 3D pose of an object is a challenging task that can be
considered within augmented reality or robotic applications. In this paper, we
propose a novel approach to perform 6 DoF object pose estimation from a single
RGB-D image. We adopt a hybrid pipeline in two stages: data-driven and
geometric respectively. The data-driven step consists of a classification CNN
to estimate the object 2D location in the image from local patches, followed by
a regression CNN trained to predict the 3D location of a set of keypoints in
the camera coordinate system. To extract the pose information, the geometric
step consists in aligning the 3D points in the camera coordinate system with
the corresponding 3D points in world coordinate system by minimizing a
registration error, thus computing the pose. Our experiments on the standard
dataset LineMod show that our approach is more robust and accurate than
state-of-the-art methods. The approach is also validated to achieve a 6 DoF
positioning task by visual servoing.Comment: This work has been accepted at IEEE Robotics and Automation Letter
Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset
The potential of digital-twin technology, involving the creation of precise
digital replicas of physical objects, to reshape AR experiences in 3D object
tracking and localization scenarios is significant. However, enabling robust 3D
object tracking in dynamic mobile AR environments remains a formidable
challenge. These scenarios often require a more robust pose estimator capable
of handling the inherent sensor-level measurement noise. In this paper,
recognizing the challenges of comprehensive solutions in existing literature,
we propose a transformer-based 6DoF pose estimator designed to achieve
state-of-the-art accuracy under real-world noisy data. To systematically
validate the new solution's performance against the prior art, we also
introduce a novel RGBD dataset called Digital Twin Tracking Dataset v2 (DTTD2),
which is focused on digital-twin object tracking scenarios. Expanded from an
existing DTTD v1 (DTTD1), the new dataset adds digital-twin data captured using
a cutting-edge mobile RGBD sensor suite on Apple iPhone 14 Pro, expanding the
applicability of our approach to iPhone sensor data. Through extensive
experimentation and in-depth analysis, we illustrate the effectiveness of our
methods under significant depth data errors, surpassing the performance of
existing baselines. Code and dataset are made publicly available at:
https://github.com/augcog/DTTD
Estimation of Absolute Scale in Monocular SLAM Using Synthetic Data
This paper addresses the problem of scale estimation in monocular SLAM by
estimating absolute distances between camera centers of consecutive image
frames. These estimates would improve the overall performance of classical (not
deep) SLAM systems and allow metric feature locations to be recovered from a
single monocular camera. We propose several network architectures that lead to
an improvement of scale estimation accuracy over the state of the art. In
addition, we exploit a possibility to train the neural network only with
synthetic data derived from a computer graphics simulator. Our key insight is
that, using only synthetic training inputs, we can achieve similar scale
estimation accuracy as that obtained from real data. This fact indicates that
fully annotated simulated data is a viable alternative to existing
deep-learning-based SLAM systems trained on real (unlabeled) data. Our
experiments with unsupervised domain adaptation also show that the difference
in visual appearance between simulated and real data does not affect scale
estimation results. Our method operates with low-resolution images (0.03MP),
which makes it practical for real-time SLAM applications with a monocular
camera
Recommended from our members
Semantic localisation via globally unique instance segmentation
In this work we propose a novel approach to semantic localisation. Our work is motivated by the need for environment perception techniques which not only perform self-localisation within a map but also simultaneously recognise surrounding objects. Such capabilities are crucial for computer vision applications which interact with the environment: autonomous driving, augmented reality or robotics. In order to achieve this goal we propose a solution which consists of three key steps. Firstly, a database of panoramic RGB images and corresponding globally unique, per-pixel object instance labels is built for the desired environment where we typically consider objects from static categories such as "building" or "tree". Secondly, a semantic segmentation network capable of predicting more than 3000 labels is trained on the collected data. Finally, for a given panoramic query image, the corresponding instance label image predicted by the network is used for semantic matching within the database. The matching is performed in two stages: (i) a fast retrieval of a small subset of database images (~100) with highly overlapping instance label histograms, followed by (ii) an explicit approximate 3 DoF (yaw, pitch, roll) alignment of the selected subset of images and the query image. We evaluate our approach in challenging indoor and outdoor navigation scenarios, achieving better or similar performance when compared to state-of-the-art image retrieval-based localisation approaches using key-point matching and image
level embedding. Our contribution includes: (i) a description of a novel semantic localisation approach using globally unique instance segmentation, (ii) corresponding quantitative and qualitative analysis and (iii) a novel CamVid-360 dataset containing 986 labelled instances of buildings, trees, road signs and poles
Autonomous Localization Of A Uav In A 3d Cad Model
This thesis presents a novel method of indoor localization and autonomous navigation of Unmanned Aerial Vehicles(UAVs) within a building, given a prebuilt Computer Aided Design(CAD) model of the building. The proposed system is novel in that it leverages the support of machine learning and traditional computer vision techniques to provide a robust method of localizing and navigating a drone autonomously in indoor and GPS denied environments leveraging preexisting knowledge of the environment. The goal of this work is to devise a method to enable a UAV to deduce its current pose within a CAD model that is fast and accurate while also maintaining efficient use of resources. A 3-Dimensional CAD model of the building to be navigated through is provided as input to the system along with the required goal position. Initially, the UAV has no idea of its location within the building. The system, comprising a stereo camera system and an Inertial Measurement Unit(IMU) as its sensors, then generates a globally consistent map of its surroundings using a Simultaneous Localization and Mapping (SLAM) algorithm. In addition to the map, it also stores spatially correlated 3D features. These 3D features are then used to generate correspondences between the SLAM map and the 3D CAD model. The correspondences are then used to generate a transformation between the SLAM map and the 3D CAD model, thus effectively localizing the UAV in the 3D CAD model. Our method has been tested to successfully localize the UAV in the test building in an average of 15 seconds in the different scenarios tested contingent upon the abundance of target features in the observed data. Due to the absence of a motion capture system, the results have been verified by the placement of tags on the ground at strategic known locations in the building and measuring the error in the projection of the current UAV location on the ground with the tag
Collaborative SLAM using a swarm intelligence-inspired exploration method
Master's thesis in Mechatronics (MAS500)Efficient exploration in multi-robot SLAM is a challenging task. This thesis describes the design of algorithms that would enable Loomo robots to collaboratively explore an unknown environment. A pose graph-based SLAM algorithm using the on-board sensors of the Loomo was developed from scratch. A YOLOv3-tiny neural network has been trained to recognize other Loomos, and an exploration simulation has been developed to test exploration methods. The bots in the simulation are controlled using swarm intelligence inspired rules. The system is not finished, and further workis needed to combine the work done in the thesis into a collaborative SLAM system that runs on the Loomo robots
- âŠ