18 research outputs found
NeU-NBV: Next Best View Planning Using Uncertainty Estimation in Image-Based Neural Rendering
Autonomous robotic tasks require actively perceiving the environment to
achieve application-specific goals. In this paper, we address the problem of
positioning an RGB camera to collect the most informative images to represent
an unknown scene, given a limited measurement budget. We propose a novel
mapless planning framework to iteratively plan the next best camera view based
on collected image measurements. A key aspect of our approach is a new
technique for uncertainty estimation in image-based neural rendering, which
guides measurement acquisition at the most uncertain view among view
candidates, thus maximising the information value during data collection. By
incrementally adding new measurements into our image collection, our approach
efficiently explores an unknown scene in a mapless manner. We show that our
uncertainty estimation is generalisable and valuable for view planning in
unknown scenes. Our planning experiments using synthetic and real-world data
verify that our uncertainty-guided approach finds informative images leading to
more accurate scene representations when compared against baselines.Comment: Accepted to IEEE/RSJ International Conference on Robotics and
Intelligent Systems (IROS) 202
SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data
Place recognition is an important component for autonomous vehicles to
achieve loop closing or global localization. In this paper, we tackle the
problem of place recognition based on sequential 3D LiDAR scans obtained by an
onboard LiDAR sensor. We propose a transformer-based network named SeqOT to
exploit the temporal and spatial information provided by sequential range
images generated from the LiDAR data. It uses multi-scale transformers to
generate a global descriptor for each sequence of LiDAR range images in an
end-to-end fashion. During online operation, our SeqOT finds similar places by
matching such descriptors between the current query sequence and those stored
in the map. We evaluate our approach on four datasets collected with different
types of LiDAR sensors in different environments. The experimental results show
that our method outperforms the state-of-the-art LiDAR-based place recognition
methods and generalizes well across different environments. Furthermore, our
method operates online faster than the frame rate of the sensor. The
implementation of our method is released as open source at:
https://github.com/BIT-MJY/SeqOT.Comment: Submitted to IEEE Transactions on Industrial Electronic
CVTNet: A Cross-View Transformer Network for Place Recognition Using LiDAR Data
LiDAR-based place recognition (LPR) is one of the most crucial components of
autonomous vehicles to identify previously visited places in GPS-denied
environments. Most existing LPR methods use mundane representations of the
input point cloud without considering different views, which may not fully
exploit the information from LiDAR sensors. In this paper, we propose a
cross-view transformer-based network, dubbed CVTNet, to fuse the range image
views (RIVs) and bird's eye views (BEVs) generated from the LiDAR data. It
extracts correlations within the views themselves using intra-transformers and
between the two different views using inter-transformers. Based on that, our
proposed CVTNet generates a yaw-angle-invariant global descriptor for each
laser scan end-to-end online and retrieves previously seen places by descriptor
matching between the current query scan and the pre-built database. We evaluate
our approach on three datasets collected with different sensor setups and
environmental conditions. The experimental results show that our method
outperforms the state-of-the-art LPR methods with strong robustness to
viewpoint changes and long-time spans. Furthermore, our approach has a good
real-time performance that can run faster than the typical LiDAR frame rate.
The implementation of our method is released as open source at:
https://github.com/BIT-MJY/CVTNet.Comment: accepted by IEEE Transactions on Industrial Informatics 202
Long-Term Localization using Semantic Cues in Floor Plan Maps
Lifelong localization in a given map is an essential capability for
autonomous service robots. In this paper, we consider the task of long-term
localization in a changing indoor environment given sparse CAD floor plans. The
commonly used pre-built maps from the robot sensors may increase the cost and
time of deployment. Furthermore, their detailed nature requires that they are
updated when significant changes occur. We address the difficulty of
localization when the correspondence between the map and the observations is
low due to the sparsity of the CAD map and the changing environment. To
overcome both challenges, we propose to exploit semantic cues that are commonly
present in human-oriented spaces. These semantic cues can be detected using RGB
cameras by utilizing object detection, and are matched against an
easy-to-update, abstract semantic map. The semantic information is integrated
into a Monte Carlo localization framework using a particle filter that operates
on 2D LiDAR scans and camera data. We provide a long-term localization solution
and a semantic map format, for environments that undergo changes to their
interior structure and detailed geometric maps are not available. We evaluate
our localization framework on multiple challenging indoor scenarios in an
office environment, taken weeks apart. The experiments suggest that our
approach is robust to structural changes and can run on an onboard computer. We
released the open source implementation of our approach written in C++ together
with a ROS wrapper.Comment: Under review for RA-
Deep Reinforcement Learning for Flipper Control of Tracked Robots
The autonomous control of flippers plays an important role in enhancing the
intelligent operation of tracked robots within complex environments. While
existing methods mainly rely on hand-crafted control models, in this paper, we
introduce a novel approach that leverages deep reinforcement learning (DRL)
techniques for autonomous flipper control in complex terrains. Specifically, we
propose a new DRL network named AT-D3QN, which ensures safe and smooth flipper
control for tracked robots. It comprises two modules, a feature extraction and
fusion module for extracting and integrating robot and environment state
features, and a deep Q-Learning control generation module for incorporating
expert knowledge to obtain a smooth and efficient control strategy. To train
the network, a novel reward function is proposed, considering both learning
efficiency and passing smoothness. A simulation environment is constructed
using the Pymunk physics engine for training. We then directly apply the
trained model to a more realistic Gazebo simulation for quantitative analysis.
The consistently high performance of the proposed approach validates its
superiority over manual teleoperation
RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving
Point cloud registration is an important task in robotics and autonomous
driving to estimate the ego-motion of the vehicle. Recent advances following
the coarse-to-fine manner show promising potential in point cloud registration.
However, existing methods rely on good superpoint correspondences, which are
hard to be obtained reliably and efficiently, thus resulting in less robust and
accurate point cloud registration. In this paper, we propose a novel network,
named RDMNet, to find dense point correspondences coarse-to-fine and improve
final pose estimation based on such reliable correspondences. Our RDMNet uses a
devised 3D-RoFormer mechanism to first extract distinctive superpoints and
generates reliable superpoints matches between two point clouds. The proposed
3D-RoFormer fuses 3D position information into the transformer network,
efficiently exploiting point clouds' contextual and geometric information to
generate robust superpoint correspondences. RDMNet then propagates the sparse
superpoints matches to dense point matches using the neighborhood information
for accurate point cloud registration. We extensively evaluate our method on
multiple datasets from different environments. The experimental results
demonstrate that our method outperforms existing state-of-the-art approaches in
all tested datasets with a strong generalization ability.Comment: 11 pages, 9 figure
Explicit Interaction for Fusion-Based Place Recognition
Fusion-based place recognition is an emerging technique jointly utilizing
multi-modal perception data, to recognize previously visited places in
GPS-denied scenarios for robots and autonomous vehicles. Recent fusion-based
place recognition methods combine multi-modal features in implicit manners.
While achieving remarkable results, they do not explicitly consider what the
individual modality affords in the fusion system. Therefore, the benefit of
multi-modal feature fusion may not be fully explored. In this paper, we propose
a novel fusion-based network, dubbed EINet, to achieve explicit interaction of
the two modalities. EINet uses LiDAR ranges to supervise more robust vision
features for long time spans, and simultaneously uses camera RGB data to
improve the discrimination of LiDAR point clouds. In addition, we develop a new
benchmark for the place recognition task based on the nuScenes dataset. To
establish this benchmark for future research with comprehensive comparisons, we
introduce both supervised and self-supervised training schemes alongside
evaluation protocols. We conduct extensive experiments on the proposed
benchmark, and the experimental results show that our EINet exhibits better
recognition performance as well as solid generalization ability compared to the
state-of-the-art fusion-based place recognition approaches. Our open-source
code and benchmark are released at: https://github.com/BIT-XJY/EINet
TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic Segmentation
LiDAR semantic segmentation plays a crucial role in enabling autonomous
driving and robots to understand their surroundings accurately and robustly.
There are different types of methods, such as point-based, range-image-based,
polar-based, and hybrid methods. Among these, range-image-based methods are
widely used due to their efficiency. However, they face a significant challenge
known as the ``many-to-one'' problem caused by the range image's limited
horizontal and vertical angular resolution. As a result, around 20\% of the 3D
points can be occluded. In this paper, we present TFNet, a range-image-based
LiDAR semantic segmentation method that utilizes temporal information to
address this issue. Specifically, we incorporate a temporal fusion layer to
extract useful information from previous scans and integrate it with the
current scan. We then design a max-voting-based post-processing technique to
correct false predictions, particularly those caused by the ``many-to-one''
issue. We evaluated the approach on two benchmarks and demonstrate that the
post-processing technique is generic and can be applied to various networks. We
will release our code and models