4 research outputs found
You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors
In this paper, we propose a novel local descriptor-based framework, called
You Only Hypothesize Once (YOHO), for the registration of two unaligned point
clouds. In contrast to most existing local descriptors which rely on a fragile
local reference frame to gain rotation invariance, the proposed descriptor
achieves the rotation invariance by recent technologies of group equivariant
feature learning, which brings more robustness to point density and noise.
Meanwhile, the descriptor in YOHO also has a rotation equivariant part, which
enables us to estimate the registration from just one correspondence
hypothesis. Such property reduces the searching space for feasible
transformations, thus greatly improves both the accuracy and the efficiency of
YOHO. Extensive experiments show that YOHO achieves superior performances with
much fewer needed RANSAC iterations on four widely-used datasets, the
3DMatch/3DLoMatch datasets, the ETH dataset and the WHU-TLS dataset. More
details are shown in our project page: https://hpwang-whu.github.io/YOHO/.Comment: Accepted by ACM Multimedia(MM) 2022, Project page:
https://hpwang-whu.github.io/YOHO
DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds
Learning from 3D point-cloud data has rapidly gained momentum, motivated by
the success of deep learning on images and the increased availability of
3D~data. In this paper, we aim to construct anisotropic convolution layers that
work directly on the surface derived from a point cloud. This is challenging
because of the lack of a global coordinate system for tangential directions on
surfaces. We introduce DeltaConv, a convolution layer that combines geometric
operators from vector calculus to enable the construction of anisotropic
filters on point clouds. Because these operators are defined on scalar- and
vector-fields, we separate the network into a scalar- and a vector-stream,
which are connected by the operators. The vector stream enables the network to
explicitly represent, evaluate, and process directional information. Our
convolutions are robust and simple to implement and match or improve on
state-of-the-art approaches on several benchmarks, while also speeding up
training and inference.Comment: 8 pages, 5 figures, 7 tables; ACM Transactions on Graphics 41, 4,
Article 105 (SIGGRAPH 2022
Visual robot navigation with omnidirectional vision
In a world where service robots are increasingly becoming an inherent part of our lives, it has become essential to provide robots with superior perception capabilities and
acute semantic knowledge of the environment. In recent years, the computer vision field has advanced immensely, providing rich information at a fraction of the cost. It
has thereby become an essential part of many autonomous systems and the sensor of choice while tackling the most challenging perception problems. Nevertheless, it is still
challenging for a robot to extract meaningful information from an image signal (a high dimensional, complex, and noisy data). This dissertation presents several contributions
towards visual robot navigation relying solely on omnidirectional vision.
The first part of the thesis is devoted to robust free-space detection using omnidirectional images. By mimicking a range sensor, the free-space extraction in the omniview
constitutes a fundamental block in our system, allowing for collision-free navigation, localization, and map-building. The uncertainty in the free-space classifications is handled
with fuzzy preference structures, which explicitly expresses it in terms of preference, conflict, and ignorance. This way, we show it is possible to substantially reduce the classification
error by rejecting queries associated with a strong degree of conflict and ignorance.
The motivation of using vision in contrast to classical proximity sensors becomes apparent after the incorporation of more semantic categories in the scene segmentation. We propose a multi-cue
classifier able to distinguish between the classes: floor, vertical structures, and clutter. This result is further enhanced to extract the scene’s spatial layout and surface reconstruction for
a better spatial and context awareness. Our scheme corrects the problematic distortions induced by the hyperbolic mirror with a novel bird’s eye formulation. The proposed framework is suitable
for self-supervised learning from 3D point cloud data.
Place context is integrated into the system by training a place category classifier able to distinguish among the categories: room, corridor, doorway, and open space. Hand-engineered features,
as well as those learned from data representations, are considered with different ensemble systems.
The last part of the thesis is concerned with local and map-based navigation. Several visual local semantic behaviors are derived by fusing the semantic scene segmentation with the semantic place
context. The advantage of the proposed local navigation is that the system can recover from conflicting errors while activating behaviors in the wrong context. Higher-level behaviors can also be
achieved by compositions of the basic ones. Finally, we propose different visual map-based navigation alternatives that reproduce or achieve better results compared to classical proximity sensors,
which include: map generation, particle filter localization, and semantic map building