Search CORE

1,473 research outputs found

Predictive World Models from Real-World Partial Observations

Author: Carballo Alexander
Fujii Keisuke
Karlsson Robin
Ohtani Kento
Takeda Kazuya
Publication venue
Publication date: 25/04/2023
Field of study

Cognitive scientists believe adaptable intelligent agents like humans perform reasoning through learned causal mental simulations of agents and environments. The problem of learning such simulations is called predictive world modeling. Recently, reinforcement learning (RL) agents leveraging world models have achieved SOTA performance in game environments. However, understanding how to apply the world modeling approach in complex real-world environments relevant to mobile robots remains an open question. In this paper, we present a framework for learning a probabilistic predictive world model for real-world road environments. We implement the model using a hierarchical VAE (HVAE) capable of predicting a diverse set of fully observed plausible worlds from accumulated sensor observations. While prior HVAE methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only. We experimentally demonstrate accurate spatial structure prediction of deterministic regions achieving 96.21 IoU, and close the gap to perfect prediction by 62% for stochastic regions using the best prediction. By extending HVAEs to cases where complete ground truth states do not exist, we facilitate continual learning of spatial prediction as a step towards realizing explainable and comprehensive predictive world models for real-world mobile robotics applications. Code is available at https://github.com/robin-karlsson0/predictive-world-models.Comment: Accepted for IEEE MOST 202

arXiv.org e-Print Archive

BEVBert: Multimodal Map Pre-training for Language-guided Navigation

Author: An Dong
Huang Yan
Li Yangguang
Qi Yuankai
Shao Jing
Tan Tieniu
Wang Liang
Publication venue
Publication date: 03/08/2023
Field of study

Large-scale pre-training has shown promising results on the vision-and-language navigation (VLN) task. However, most existing pre-training methods employ discrete panoramas to learn visual-textual associations. This requires the model to implicitly correlate incomplete, duplicate observations within the panoramas, which may impair an agent's spatial understanding. Thus, we propose a new map-based pre-training paradigm that is spatial-aware for use in VLN. Concretely, we build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map. This hybrid design can balance the demand of VLN for both short-term reasoning and long-term planning. Then, based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal. Extensive experiments demonstrate the effectiveness of the map-based pre-training route for VLN, and the proposed method achieves state-of-the-art on four VLN benchmarks.Comment: ICCV 2023, project page: https://github.com/MarSaKi/VLN-BEVBer

arXiv.org e-Print Archive

Probable Object Location (POLo) Score Estimation for Efficient Object Goal Navigation

Author: Soh Harold
Wang Jiaming
Publication venue
Publication date: 14/11/2023
Field of study

To advance the field of autonomous robotics, particularly in object search tasks within unexplored environments, we introduce a novel framework centered around the Probable Object Location (POLo) score. Utilizing a 3D object probability map, the POLo score allows the agent to make data-driven decisions for efficient object search. We further enhance the framework's practicality by introducing POLoNet, a neural network trained to approximate the computationally intensive POLo score. Our approach addresses critical limitations of both end-to-end reinforcement learning methods, which suffer from memory decay over long-horizon tasks, and traditional map-based methods that neglect visibility constraints. Our experiments, involving the first phase of the OVMM 2023 challenge, demonstrate that an agent equipped with POLoNet significantly outperforms a range of baseline methods, including end-to-end RL techniques and prior map-based strategies. To provide a comprehensive evaluation, we introduce new performance metrics that offer insights into the efficiency and effectiveness of various agents in object goal navigation.Comment: Under revie

arXiv.org e-Print Archive

Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights

Author: Moosavi Sobhan
Parthasarathy Srinivasan
Ramnath Rajiv
Samavatian Mohammad Hossein
Teodorescu Radu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/09/2019
Field of study

Reducing traffic accidents is an important public safety challenge, therefore, accident analysis and prediction has been a topic of much research over the past few decades. Using small-scale datasets with limited coverage, being dependent on extensive set of data, and being not applicable for real-time purposes are the important shortcomings of the existing studies. To address these challenges, we propose a new solution for real-time traffic accident prediction using easy-to-obtain, but sparse data. Our solution relies on a deep-neural-network model (which we have named DAP, for Deep Accident Prediction); which utilizes a variety of data attributes such as traffic events, weather data, points-of-interest, and time. DAP incorporates multiple components including a recurrent (for time-sensitive data), a fully connected (for time-insensitive data), and a trainable embedding component (to capture spatial heterogeneity). To fill the data gap, we have - through a comprehensive process of data collection, integration, and augmentation - created a large-scale publicly available database of accident information named US-Accidents. By employing the US-Accidents dataset and through an extensive set of experiments across several large cities, we have evaluated our proposal against several baselines. Our analysis and results show significant improvements to predict rare accident events. Further, we have shown the impact of traffic information, time, and points-of-interest data for real-time accident prediction.Comment: In Proceedings of the 27th ACM SIGSPATIAL, International Conference on Advances in Geographic Information Systems (2019). arXiv admin note: substantial text overlap with arXiv:1906.0540

arXiv.org e-Print Archive

Crossref

Model-Based Control Using Koopman Operators

Author: Abraham Ian
De La Torre Gerardo
Murphey Todd D.
Publication venue: 'Robotics: Science and Systems Foundation'
Publication date: 05/09/2017
Field of study

This paper explores the application of Koopman operator theory to the control of robotic systems. The operator is introduced as a method to generate data-driven models that have utility for model-based control methods. We then motivate the use of the Koopman operator towards augmenting model-based control. Specifically, we illustrate how the operator can be used to obtain a linearizable data-driven model for an unknown dynamical process that is useful for model-based control synthesis. Simulated results show that with increasing complexity in the choice of the basis functions, a closed-loop controller is able to invert and stabilize a cart- and VTOL-pendulum systems. Furthermore, the specification of the basis function are shown to be of importance when generating a Koopman operator for specific robotic systems. Experimental results with the Sphero SPRK robot explore the utility of the Koopman operator in a reduced state representation setting where increased complexity in the basis function improve open- and closed-loop controller performance in various terrains, including sand.Comment: 8 page

arXiv.org e-Print Archive

Crossref