2,846 research outputs found
Challenges and solutions for autonomous ground robot scene understanding and navigation in unstructured outdoor environments: A review
The capabilities of autonomous mobile robotic systems have been steadily improving due to recent advancements in computer science, engineering, and related disciplines such as cognitive science. In controlled environments, robots have achieved relatively high levels of autonomy. In more unstructured environments, however, the development of fully autonomous mobile robots remains challenging due to the complexity of understanding these environments. Many autonomous mobile robots use classical, learning-based or hybrid approaches for navigation. More recent learning-based methods may replace the complete navigation pipeline or selected stages of the classical approach. For effective deployment, autonomous robots must understand their external environments at a sophisticated level according to their intended applications. Therefore, in addition to robot perception, scene analysis and higher-level scene understanding (e.g., traversable/non-traversable, rough or smooth terrain, etc.) are required for autonomous robot navigation in unstructured outdoor environments. This paper provides a comprehensive review and critical analysis of these methods in the context of their applications to the problems of robot perception and scene understanding in unstructured environments and the related problems of localisation, environment mapping and path planning. State-of-the-art sensor fusion methods and multimodal scene understanding approaches are also discussed and evaluated within this context. The paper concludes with an in-depth discussion regarding the current state of the autonomous ground robot navigation challenge in unstructured outdoor environments and the most promising future research directions to overcome these challenges
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
A robot that can carry out a natural-language instruction has been a dream
since before the Jetsons cartoon series imagined a life of leisure mediated by
a fleet of attentive robot helpers. It is a dream that remains stubbornly
distant. However, recent advances in vision and language methods have made
incredible progress in closely related areas. This is significant because a
robot interpreting a natural-language navigation instruction on the basis of
what it sees is carrying out a vision and language process that is similar to
Visual Question Answering. Both tasks can be interpreted as visually grounded
sequence-to-sequence translation problems, and many of the same methods are
applicable. To enable and encourage the application of vision and language
methods to the problem of interpreting visually-grounded navigation
instructions, we present the Matterport3D Simulator -- a large-scale
reinforcement learning environment based on real imagery. Using this simulator,
which can in future support a range of embodied vision and language tasks, we
provide the first benchmark dataset for visually-grounded natural language
navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio
4CNet: A Confidence-Aware, Contrastive, Conditional, Consistency Model for Robot Map Prediction in Multi-Robot Environments
Mobile robots in unknown cluttered environments with irregularly shaped
obstacles often face sensing, energy, and communication challenges which
directly affect their ability to explore these environments. In this paper, we
introduce a novel deep learning method, Confidence-Aware Contrastive
Conditional Consistency Model (4CNet), for mobile robot map prediction during
resource-limited exploration in multi-robot environments. 4CNet uniquely
incorporates: 1) a conditional consistency model for map prediction in
irregularly shaped unknown regions, 2) a contrastive map-trajectory pretraining
framework for a trajectory encoder that extracts spatial information from the
trajectories of nearby robots during map prediction, and 3) a confidence
network to measure the uncertainty of map prediction for effective exploration
under resource constraints. We incorporate 4CNet within our proposed robot
exploration with map prediction architecture, 4CNet-E. We then conduct
extensive comparison studies with 4CNet-E and state-of-the-art heuristic and
learning methods to investigate both map prediction and exploration performance
in environments consisting of uneven terrain and irregularly shaped obstacles.
Results showed that 4CNet-E obtained statistically significant higher
prediction accuracy and area coverage with varying environment sizes, number of
robots, energy budgets, and communication limitations. Real-world mobile robot
experiments were performed and validated the feasibility and generalizability
of 4CNet-E for mobile robot map prediction and exploration.Comment: 14 pages, 10 figure
Learning Perception-Aware Agile Flight in Cluttered Environments
Recently, neural control policies have outperformed existing model-based planning-and-control methods for autonomously navigating quadrotors through cluttered environments in minimum time. However, they are not perception aware, a crucial requirement in vision-based navigation due to the camera's limited field of view and the underactuated nature of a quadrotor. We propose a learning-based system that achieves perception-aware, agile flight in cluttered environments. Our method combines imitation learning with reinforcement learning (RL) by leveraging a privileged learning-by-cheating framework. Using RL, we first train a perception-aware teacher policy with full-state information to fly in minimum time through cluttered environments. Then, we use imitation learning to distill its knowledge into a vision-based student policy that only perceives the environment via a camera. Our approach tightly couples perception and control, showing a significant advantage in computation speed (10Ăfaster) and success rate. We demonstrate the closed-loop control performance using hardware-in-the-loop simulation
Change of Scenery: Unsupervised LiDAR Change Detection for Mobile Robots
This paper presents a fully unsupervised deep change detection approach for
mobile robots with 3D LiDAR. In unstructured environments, it is infeasible to
define a closed set of semantic classes. Instead, semantic segmentation is
reformulated as binary change detection. We develop a neural network,
RangeNetCD, that uses an existing point-cloud map and a live LiDAR scan to
detect scene changes with respect to the map. Using a novel loss function,
existing point-cloud semantic segmentation networks can be trained to perform
change detection without any labels or assumptions about local semantics. We
demonstrate the performance of this approach on data from challenging terrains;
mean intersection over union (mIoU) scores range between 67.4% and 82.2%
depending on the amount of environmental structure. This outperforms the
geometric baseline used in all experiments. The neural network runs faster than
10Hz and is integrated into a robot's autonomy stack to allow safe navigation
around obstacles that intersect the planned path. In addition, a novel method
for the rapid automated acquisition of per-point ground-truth labels is
described. Covering changed parts of the scene with retroreflective materials
and applying a threshold filter to the intensity channel of the LiDAR allows
for quantitative evaluation of the change detector.Comment: 7 pages (6 content, 1 references). 7 figures, submitted to the 2024
IEEE International Conference on Robotics and Automation (ICRA
GrASPE: Graph based Multimodal Fusion for Robot Navigation in Unstructured Outdoor Environments
We present a novel trajectory traversability estimation and planning
algorithm for robot navigation in complex outdoor environments. We incorporate
multimodal sensory inputs from an RGB camera, 3D LiDAR, and robot's odometry
sensor to train a prediction model to estimate candidate trajectories' success
probabilities based on partially reliable multi-modal sensor observations. We
encode high-dimensional multi-modal sensory inputs to low-dimensional feature
vectors using encoder networks and represent them as a connected graph to train
an attention-based Graph Neural Network (GNN) model to predict trajectory
success probabilities. We further analyze the image and point cloud data
separately to quantify sensor reliability to augment the weights of the feature
graph representation used in our GNN. During runtime, our model utilizes
multi-sensor inputs to predict the success probabilities of the trajectories
generated by a local planner to avoid potential collisions and failures. Our
algorithm demonstrates robust predictions when one or more sensor modalities
are unreliable or unavailable in complex outdoor environments. We evaluate our
algorithm's navigation performance using a Spot robot in real-world outdoor
environments
Specifying and Interpreting Reinforcement Learning Policies through Simulatable Machine Learning
Human-AI collaborative policy synthesis is a procedure in which (1) a human
initializes an autonomous agent's behavior, (2) Reinforcement Learning improves
the human specified behavior, and (3) the agent can explain the final optimized
policy to the user. This paradigm leverages human expertise and facilitates a
greater insight into the learned behaviors of an agent. Existing approaches to
enabling collaborative policy specification involve black box methods which are
unintelligible and are not catered towards non-expert end-users. In this paper,
we develop a novel collaborative framework to enable humans to initialize and
interpret an autonomous agent's behavior, rooted in principles of
human-centered design. Through our framework, we enable humans to specify an
initial behavior model in the form of unstructured, natural language, which we
then convert to lexical decision trees. Next, we are able to leverage these
human-specified policies, to warm-start reinforcement learning and further
allow the agent to optimize the policies through reinforcement learning.
Finally, to close the loop on human-specification, we produce explanations of
the final learned policy, in multiple modalities, to provide the user a final
depiction about the learned policy of the agent. We validate our approach by
showing that our model can produce >80% accuracy, and that human-initialized
policies are able to successfully warm-start RL. We then conduct a novel
human-subjects study quantifying the relative subjective and objective benefits
of varying XAI modalities(e.g., Tree, Language, and Program) for explaining
learned policies to end-users, in terms of usability and interpretability and
identify the circumstances that influence these measures. Our findings
emphasize the need for personalized explainable systems that can facilitate
user-centric policy explanations for a variety of end-users
Deep-learning the Latent Space of Light Transport
We suggest a method to directly deepâlearn light transport, i. e., the mapping from a 3D geometryâilluminationâmaterial configuration to a shaded 2D image. While many previous learning methods have employed 2D convolutional neural networks applied to images, we show for the first time that light transport can be learned directly in 3D. The benefit of 3D over 2D is, that the former can also correctly capture illumination effects related to occluded and/or semiâtransparent geometry. To learn 3D light transport, we represent the 3D scene as an unstructured 3D point cloud, which is later, during rendering, projected to the 2D output image. Thus, we suggest a twoâstage operator comprising a 3D network that first transforms the point cloud into a latent representation, which is later on projected to the 2D output image using a dedicated 3Dâ2D network in a second step. We will show that our approach results in improved quality in terms of temporal coherence while retaining most of the computational efficiency of common 2D methods. As a consequence, the proposed two stageâoperator serves as a valuable extension to modern deferred shading approaches
- âŠ