4 research outputs found

    You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors

    Full text link
    In this paper, we propose a novel local descriptor-based framework, called You Only Hypothesize Once (YOHO), for the registration of two unaligned point clouds. In contrast to most existing local descriptors which rely on a fragile local reference frame to gain rotation invariance, the proposed descriptor achieves the rotation invariance by recent technologies of group equivariant feature learning, which brings more robustness to point density and noise. Meanwhile, the descriptor in YOHO also has a rotation equivariant part, which enables us to estimate the registration from just one correspondence hypothesis. Such property reduces the searching space for feasible transformations, thus greatly improves both the accuracy and the efficiency of YOHO. Extensive experiments show that YOHO achieves superior performances with much fewer needed RANSAC iterations on four widely-used datasets, the 3DMatch/3DLoMatch datasets, the ETH dataset and the WHU-TLS dataset. More details are shown in our project page: https://hpwang-whu.github.io/YOHO/.Comment: Accepted by ACM Multimedia(MM) 2022, Project page: https://hpwang-whu.github.io/YOHO

    DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds

    Full text link
    Learning from 3D point-cloud data has rapidly gained momentum, motivated by the success of deep learning on images and the increased availability of 3D~data. In this paper, we aim to construct anisotropic convolution layers that work directly on the surface derived from a point cloud. This is challenging because of the lack of a global coordinate system for tangential directions on surfaces. We introduce DeltaConv, a convolution layer that combines geometric operators from vector calculus to enable the construction of anisotropic filters on point clouds. Because these operators are defined on scalar- and vector-fields, we separate the network into a scalar- and a vector-stream, which are connected by the operators. The vector stream enables the network to explicitly represent, evaluate, and process directional information. Our convolutions are robust and simple to implement and match or improve on state-of-the-art approaches on several benchmarks, while also speeding up training and inference.Comment: 8 pages, 5 figures, 7 tables; ACM Transactions on Graphics 41, 4, Article 105 (SIGGRAPH 2022

    Visual robot navigation with omnidirectional vision

    No full text
    In a world where service robots are increasingly becoming an inherent part of our lives, it has become essential to provide robots with superior perception capabilities and acute semantic knowledge of the environment. In recent years, the computer vision field has advanced immensely, providing rich information at a fraction of the cost. It has thereby become an essential part of many autonomous systems and the sensor of choice while tackling the most challenging perception problems. Nevertheless, it is still challenging for a robot to extract meaningful information from an image signal (a high dimensional, complex, and noisy data). This dissertation presents several contributions towards visual robot navigation relying solely on omnidirectional vision. The first part of the thesis is devoted to robust free-space detection using omnidirectional images. By mimicking a range sensor, the free-space extraction in the omniview constitutes a fundamental block in our system, allowing for collision-free navigation, localization, and map-building. The uncertainty in the free-space classifications is handled with fuzzy preference structures, which explicitly expresses it in terms of preference, conflict, and ignorance. This way, we show it is possible to substantially reduce the classification error by rejecting queries associated with a strong degree of conflict and ignorance. The motivation of using vision in contrast to classical proximity sensors becomes apparent after the incorporation of more semantic categories in the scene segmentation. We propose a multi-cue classifier able to distinguish between the classes: floor, vertical structures, and clutter. This result is further enhanced to extract the scene’s spatial layout and surface reconstruction for a better spatial and context awareness. Our scheme corrects the problematic distortions induced by the hyperbolic mirror with a novel bird’s eye formulation. The proposed framework is suitable for self-supervised learning from 3D point cloud data. Place context is integrated into the system by training a place category classifier able to distinguish among the categories: room, corridor, doorway, and open space. Hand-engineered features, as well as those learned from data representations, are considered with different ensemble systems. The last part of the thesis is concerned with local and map-based navigation. Several visual local semantic behaviors are derived by fusing the semantic scene segmentation with the semantic place context. The advantage of the proposed local navigation is that the system can recover from conflicting errors while activating behaviors in the wrong context. Higher-level behaviors can also be achieved by compositions of the basic ones. Finally, we propose different visual map-based navigation alternatives that reproduce or achieve better results compared to classical proximity sensors, which include: map generation, particle filter localization, and semantic map building
    corecore