938 research outputs found
Supervised Autonomous Locomotion and Manipulation for Disaster Response with a Centaur-like Robot
Mobile manipulation tasks are one of the key challenges in the field of
search and rescue (SAR) robotics requiring robots with flexible locomotion and
manipulation abilities. Since the tasks are mostly unknown in advance, the
robot has to adapt to a wide variety of terrains and workspaces during a
mission. The centaur-like robot Centauro has a hybrid legged-wheeled base and
an anthropomorphic upper body to carry out complex tasks in environments too
dangerous for humans. Due to its high number of degrees of freedom, controlling
the robot with direct teleoperation approaches is challenging and exhausting.
Supervised autonomy approaches are promising to increase quality and speed of
control while keeping the flexibility to solve unknown tasks. We developed a
set of operator assistance functionalities with different levels of autonomy to
control the robot for challenging locomotion and manipulation tasks. The
integrated system was evaluated in disaster response scenarios and showed
promising performance.Comment: In Proceedings of IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), Madrid, Spain, October 201
Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild
While 6D object pose estimation has wide applications across computer vision
and robotics, it remains far from being solved due to the lack of annotations.
The problem becomes even more challenging when moving to category-level 6D
pose, which requires generalization to unseen instances. Current approaches are
restricted by leveraging annotations from simulation or collected from humans.
In this paper, we overcome this barrier by introducing a self-supervised
learning approach trained directly on large-scale real-world object videos for
category-level 6D pose estimation in the wild. Our framework reconstructs the
canonical 3D shape of an object category and learns dense correspondences
between input images and the canonical shape via surface embedding. For
training, we propose novel geometrical cycle-consistency losses which construct
cycles across 2D-3D spaces, across different instances and different time
steps. The learned correspondence can be applied for 6D pose estimation and
other downstream tasks such as keypoint transfer. Surprisingly, our method,
without any human annotations or simulators, can achieve on-par or even better
performance than previous supervised or semi-supervised methods on in-the-wild
images. Our project page is: https://kywind.github.io/self-pose .Comment: Project page: https://kywind.github.io/self-pos
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
In this paper, we focus on category-level 6D pose and size estimation from
monocular RGB-D image. Previous methods suffer from inefficient category-level
pose feature extraction which leads to low accuracy and inference speed. To
tackle this problem, we propose a fast shape-based network (FS-Net) with
efficient category-level feature extraction for 6D pose estimation. First, we
design an orientation aware autoencoder with 3D graph convolution for latent
feature extraction. The learned latent feature is insensitive to point shift
and object size thanks to the shift and scale-invariance properties of the 3D
graph convolution. Then, to efficiently decode category-level rotation
information from the latent feature, we propose a novel decoupled rotation
mechanism that employs two decoders to complementarily access the rotation
information. Meanwhile, we estimate translation and size by two residuals,
which are the difference between the mean of object points and ground truth
translation, and the difference between the mean size of the category and
ground truth size, respectively. Finally, to increase the generalization
ability of FS-Net, we propose an online box-cage based 3D deformation mechanism
to augment the training data. Extensive experiments on two benchmark datasets
show that the proposed method achieves state-of-the-art performance in both
category- and instance-level 6D object pose estimation. Especially in
category-level pose estimation, without extra synthetic data, our method
outperforms existing methods by 6.3% on the NOCS-REAL dataset.Comment: accepted by CVPR2021, ora
RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery
While showing promising results, recent RGB-D camera-based category-level
object pose estimation methods have restricted applications due to the heavy
reliance on depth sensors. RGB-only methods provide an alternative to this
problem yet suffer from inherent scale ambiguity stemming from monocular
observations. In this paper, we propose a novel pipeline that decouples the 6D
pose and size estimation to mitigate the influence of imperfect scales on rigid
transformations. Specifically, we leverage a pre-trained monocular estimator to
extract local geometric information, mainly facilitating the search for inlier
2D-3D correspondence. Meanwhile, a separate branch is designed to directly
recover the metric scale of the object based on category-level statistics.
Finally, we advocate using the RANSAC-PP algorithm to robustly solve for 6D
object pose. Extensive experiments have been conducted on both synthetic and
real datasets, demonstrating the superior performance of our method over
previous state-of-the-art RGB-based approaches, especially in terms of rotation
accuracy. Code: https://github.com/goldoak/DMSR
Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation From Monocular RGB Image
Recently, RGBD-based category-level 6D object pose estimation has achieved
promising improvement in performance, however, the requirement of depth
information prohibits broader applications. In order to relieve this problem,
this paper proposes a novel approach named Object Level Depth reconstruction
Network (OLD-Net) taking only RGB images as input for category-level 6D object
pose estimation. We propose to directly predict object-level depth from a
monocular RGB image by deforming the category-level shape prior into
object-level depth and the canonical NOCS representation. Two novel modules
named Normalized Global Position Hints (NGPH) and Shape-aware Decoupled Depth
Reconstruction (SDDR) module are introduced to learn high fidelity object-level
depth and delicate shape representations. At last, the 6D object pose is solved
by aligning the predicted canonical representation with the back-projected
object-level depth. Extensive experiments on the challenging CAMERA25 and
REAL275 datasets indicate that our model, though simple, achieves
state-of-the-art performance.Comment: 19 pages, 7 figures, 4 table
Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks
It is difficult to precisely annotate object instances and their semantics in
3D space, and as such, synthetic data are extensively used for these tasks,
e.g., category-level 6D object pose and size estimation. However, the easy
annotations in synthetic domains bring the downside effect of synthetic-to-real
(Sim2Real) domain gap. In this work, we aim to address this issue in the task
setting of Sim2Real, unsupervised domain adaptation for category-level 6D
object pose and size estimation. We propose a method that is built upon a novel
Deep Prior Deformation Network, shortened as DPDN. DPDN learns to deform
features of categorical shape priors to match those of object observations, and
is thus able to establish deep correspondence in the feature space for direct
regression of object poses and sizes. To reduce the Sim2Real domain gap, we
formulate a novel self-supervised objective upon DPDN via consistency learning;
more specifically, we apply two rigid transformations to each object
observation in parallel, and feed them into DPDN respectively to yield dual
sets of predictions; on top of the parallel learning, an inter-consistency term
is employed to keep cross consistency between dual predictions for improving
the sensitivity of DPDN to pose changes, while individual intra-consistency
ones are used to enforce self-adaptation within each learning itself. We train
DPDN on both training sets of the synthetic CAMERA25 and real-world REAL275
datasets; our results outperform the existing methods on REAL275 test set under
both the unsupervised and supervised settings. Ablation studies also verify the
efficacy of our designs. Our code is released publicly at
https://github.com/JiehongLin/Self-DPDN.Comment: Accepted by ECCV202
- …