154 research outputs found

    Sequential Bayesian Optimization for Adaptive Informative Path Planning with Multimodal Sensing

    Full text link
    Adaptive Informative Path Planning with Multimodal Sensing (AIPPMS) considers the problem of an agent equipped with multiple sensors, each with different sensing accuracy and energy costs. The agent's goal is to explore the environment and gather information subject to its resource constraints in unknown, partially observable environments. Previous work has focused on the less general Adaptive Informative Path Planning (AIPP) problem, which considers only the effect of the agent's movement on received observations. The AIPPMS problem adds additional complexity by requiring that the agent reasons jointly about the effects of sensing and movement while balancing resource constraints with information objectives. We formulate the AIPPMS problem as a belief Markov decision process with Gaussian process beliefs and solve it using a sequential Bayesian optimization approach with online planning. Our approach consistently outperforms previous AIPPMS solutions by more than doubling the average reward received in almost every experiment while also reducing the root-mean-square error in the environment belief by 50%. We completely open-source our implementation to aid in further development and comparison

    Developing Reactive Distributed Aerial Robotics Platforms for Real-time Contaminant Mapping

    Get PDF
    The focus of this research is to design a sensor data aggregation system and centralized sensor-driven trajectory planning algorithm for fixed-wing aircraft to optimally assist atmospheric simulators in mapping the local environment in real-time. The proposed application of this work is to be used in the event of a hazardous contaminant leak into the atmosphere as a fleet of sensing unmanned aerial vehicles (UAVs) could provide valuable information for evacuation measures. The data aggregation system was designed using a state-of-the-art networking protocol and radio with DigiMesh and a process/data management system in the ROS2 DDS. This system was tested to consistently operate within the latencies and distances tolerated for the project while being highly extensible to sensor configurations. The problem of creating optimal trajectory planning for exploration has been modelled accurately using partially-observable Markov decision processes (POMDP). Deep Reinforcement learning (DRL) is commonly applied to approximate optimal solutions within a POMDP as it can be analytically intractable for complex state spaces. This research produces a POMDP that describes this exploration problem and applies the state-of-the-art soft actor-critic (SAC) reinforcement learning algorithm to create a policy that produces near-optimal trajectories within this new POMDP. A subset of the spatially relevant inputis used instead of complete state during training and a turn-taking sequential planner is designed for using multiple UAVs to help mitigate scalability problems that come with multi-UAV coordination. The learned policy from SAC can outperform a greedy and fixed trajectory on 1, 2, and 3 UAVs by a 30% margin on average. The turn-taking strategy provides small, but repeatable scaling benefits while the windowed input results in a 50%-60% increase in reward versus trained networks without windowed input. The proposed planning algorithm is effective in dynamic map exploration and has the potential to increase UAV effectiveness in atmospheric contaminant leak monitoring as it is expanded to be integrated on real-world UAVs

    Bayesian Optimisation for Planning And Reinforcement Learning

    Get PDF
    This thesis addresses the problem of achieving efficient non-myopic decision making by explicitly balancing exploration and exploitation. Decision making, both in planning and reinforcement learning (RL), enables agents or robots to complete tasks by acting on their environments. Complexity arises when completing objectives requires sacrificing short-term performance in order to achieve better long-term performance. Decision making algorithms with this characteristic are known as non-myopic, and require long sequences of actions to be evaluated, thereby greatly increasing the search space size. Optimal behaviours need balance two key quantities: exploration and exploitation. Exploitation takes advantage of previously acquired information or high performing solutions, whereas exploration focuses on acquiring more informative data. The balance between these quantities is crucial in both RL and planning. This thesis brings the following contributions: Firstly, a reward function trading off exploration and exploitation of gradients for sequential planning is proposed. It is based on Bayesian optimisation (BO) and is combined to a non-myopic planner to achieve efficient spatial monitoring. Secondly, the algorithm is extended to continuous actions spaces, called continuous belief tree search (CBTS), and uses BO to dynamically sample actions within a tree search, balancing high-performing actions and novelty. Finally, the framework is extended to RL, for which a multi-objective methodology for explicit exploration and exploitation balance is proposed. The two objectives are modelled explicitly and balanced at a policy level, as in BO. This allows for online exploration strategies, as well as a data-efficient model-free RL algorithm achieving exploration by minimising the uncertainty of Q-values (EMU-Q). The proposed algorithms are evaluated on different simulated and real-world robotics problems, displaying superior performance in terms of sample efficiency and exploration
    • …
    corecore