1,517 research outputs found
Predictive World Models from Real-World Partial Observations
Cognitive scientists believe adaptable intelligent agents like humans perform
reasoning through learned causal mental simulations of agents and environments.
The problem of learning such simulations is called predictive world modeling.
Recently, reinforcement learning (RL) agents leveraging world models have
achieved SOTA performance in game environments. However, understanding how to
apply the world modeling approach in complex real-world environments relevant
to mobile robots remains an open question. In this paper, we present a
framework for learning a probabilistic predictive world model for real-world
road environments. We implement the model using a hierarchical VAE (HVAE)
capable of predicting a diverse set of fully observed plausible worlds from
accumulated sensor observations. While prior HVAE methods require complete
states as ground truth for learning, we present a novel sequential training
method to allow HVAEs to learn to predict complete states from partially
observed states only. We experimentally demonstrate accurate spatial structure
prediction of deterministic regions achieving 96.21 IoU, and close the gap to
perfect prediction by 62% for stochastic regions using the best prediction. By
extending HVAEs to cases where complete ground truth states do not exist, we
facilitate continual learning of spatial prediction as a step towards realizing
explainable and comprehensive predictive world models for real-world mobile
robotics applications. Code is available at
https://github.com/robin-karlsson0/predictive-world-models.Comment: Accepted for IEEE MOST 202
BEVBert: Multimodal Map Pre-training for Language-guided Navigation
Large-scale pre-training has shown promising results on the
vision-and-language navigation (VLN) task. However, most existing pre-training
methods employ discrete panoramas to learn visual-textual associations. This
requires the model to implicitly correlate incomplete, duplicate observations
within the panoramas, which may impair an agent's spatial understanding. Thus,
we propose a new map-based pre-training paradigm that is spatial-aware for use
in VLN. Concretely, we build a local metric map to explicitly aggregate
incomplete observations and remove duplicates, while modeling navigation
dependency in a global topological map. This hybrid design can balance the
demand of VLN for both short-term reasoning and long-term planning. Then, based
on the hybrid map, we devise a pre-training framework to learn a multimodal map
representation, which enhances spatial-aware cross-modal reasoning thereby
facilitating the language-guided navigation goal. Extensive experiments
demonstrate the effectiveness of the map-based pre-training route for VLN, and
the proposed method achieves state-of-the-art on four VLN benchmarks.Comment: ICCV 2023, project page: https://github.com/MarSaKi/VLN-BEVBer
Probable Object Location (POLo) Score Estimation for Efficient Object Goal Navigation
To advance the field of autonomous robotics, particularly in object search
tasks within unexplored environments, we introduce a novel framework centered
around the Probable Object Location (POLo) score. Utilizing a 3D object
probability map, the POLo score allows the agent to make data-driven decisions
for efficient object search. We further enhance the framework's practicality by
introducing POLoNet, a neural network trained to approximate the
computationally intensive POLo score. Our approach addresses critical
limitations of both end-to-end reinforcement learning methods, which suffer
from memory decay over long-horizon tasks, and traditional map-based methods
that neglect visibility constraints. Our experiments, involving the first phase
of the OVMM 2023 challenge, demonstrate that an agent equipped with POLoNet
significantly outperforms a range of baseline methods, including end-to-end RL
techniques and prior map-based strategies. To provide a comprehensive
evaluation, we introduce new performance metrics that offer insights into the
efficiency and effectiveness of various agents in object goal navigation.Comment: Under revie
Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights
Reducing traffic accidents is an important public safety challenge,
therefore, accident analysis and prediction has been a topic of much research
over the past few decades. Using small-scale datasets with limited coverage,
being dependent on extensive set of data, and being not applicable for
real-time purposes are the important shortcomings of the existing studies. To
address these challenges, we propose a new solution for real-time traffic
accident prediction using easy-to-obtain, but sparse data. Our solution relies
on a deep-neural-network model (which we have named DAP, for Deep Accident
Prediction); which utilizes a variety of data attributes such as traffic
events, weather data, points-of-interest, and time. DAP incorporates multiple
components including a recurrent (for time-sensitive data), a fully connected
(for time-insensitive data), and a trainable embedding component (to capture
spatial heterogeneity). To fill the data gap, we have - through a comprehensive
process of data collection, integration, and augmentation - created a
large-scale publicly available database of accident information named
US-Accidents. By employing the US-Accidents dataset and through an extensive
set of experiments across several large cities, we have evaluated our proposal
against several baselines. Our analysis and results show significant
improvements to predict rare accident events. Further, we have shown the impact
of traffic information, time, and points-of-interest data for real-time
accident prediction.Comment: In Proceedings of the 27th ACM SIGSPATIAL, International Conference
on Advances in Geographic Information Systems (2019). arXiv admin note:
substantial text overlap with arXiv:1906.0540
Model-Based Control Using Koopman Operators
This paper explores the application of Koopman operator theory to the control
of robotic systems. The operator is introduced as a method to generate
data-driven models that have utility for model-based control methods. We then
motivate the use of the Koopman operator towards augmenting model-based
control. Specifically, we illustrate how the operator can be used to obtain a
linearizable data-driven model for an unknown dynamical process that is useful
for model-based control synthesis. Simulated results show that with increasing
complexity in the choice of the basis functions, a closed-loop controller is
able to invert and stabilize a cart- and VTOL-pendulum systems. Furthermore,
the specification of the basis function are shown to be of importance when
generating a Koopman operator for specific robotic systems. Experimental
results with the Sphero SPRK robot explore the utility of the Koopman operator
in a reduced state representation setting where increased complexity in the
basis function improve open- and closed-loop controller performance in various
terrains, including sand.Comment: 8 page
- …