5 research outputs found

    Adaptive Informative Path Planning with Multimodal Sensing

    Full text link
    Adaptive Informative Path Planning (AIPP) problems model an agent tasked with obtaining information subject to resource constraints in unknown, partially observable environments. Existing work on AIPP has focused on representing observations about the world as a result of agent movement. We formulate the more general setting where the agent may choose between different sensors at the cost of some energy, in addition to traversing the environment to gather information. We call this problem AIPPMS (MS for Multimodal Sensing). AIPPMS requires reasoning jointly about the effects of sensing and movement in terms of both energy expended and information gained. We frame AIPPMS as a Partially Observable Markov Decision Process (POMDP) and solve it with online planning. Our approach is based on the Partially Observable Monte Carlo Planning framework with modifications to ensure constraint feasibility and a heuristic rollout policy tailored for AIPPMS. We evaluate our method on two domains: a simulated search-and-rescue scenario and a challenging extension to the classic RockSample problem. We find that our approach outperforms a classic AIPP algorithm that is modified for AIPPMS, as well as online planning using a random rollout policy.Comment: First two authors contributed equally; International Conference on Automated Planning and Scheduling (ICAPS) 202

    Convex Hull Monte-Carlo Tree Search

    Full text link
    This work investigates Monte-Carlo planning for agents in stochastic environments, with multiple objectives. We propose the Convex Hull Monte-Carlo Tree-Search (CHMCTS) framework, which builds upon Trial Based Heuristic Tree Search and Convex Hull Value Iteration (CHVI), as a solution to multi-objective planning in large environments. Moreover, we consider how to pose the problem of approximating multiobjective planning solutions as a contextual multi-armed bandits problem, giving a principled motivation for how to select actions from the view of contextual regret. This leads us to the use of Contextual Zooming for action selection, yielding Zooming CHMCTS. We evaluate our algorithm using the Generalised Deep Sea Treasure environment, demonstrating that Zooming CHMCTS can achieve a sublinear contextual regret and scales better than CHVI on a given computational budget.Comment: Camera-ready version of paper accepted to ICAPS 2020, along with relevant appendice

    Efficient Environment Sensing and Learning for Mobile Robots

    Get PDF
    Data-driven learning is becoming an integral part of many robotic systems. Robots can be used as mobile sensors to learn about the environment in which they operate. Robots can also seek to learn essential skills, such as navigation, within the environment. A critical challenge in both types of learning is sample efficiency. Acquiring samples with physical robots can be prohibitively time-consuming. As a result, when applying learning techniques in robotics that require physical interaction with the environment, minimizing the number of such interactions becomes a key. The key question we seek to answer is: How do we make robots learn efficiently with a minimal amount of physical interaction? We approach this question along two fronts: extrinsic learning and intrinsic learning. In extrinsic learning, we want the robot to learn about the external environment in which it is operating. In intrinsic learning, our focus is on the robot to learn a skill using reinforcement learning (RL) such as navigating in an environment. In this dissertation, we develop algorithms that carefully plan where the robots obtain samples in order to efficiently perform intrinsic and extrinsic learning. In particular, we exploit the structural properties of Gaussian Process (GP) regression to design efficient sampling algorithms. We study two types of problems under extrinsic learning. We start with the problem of learning a spatially varying field modeled by a GP efficiently. Our goal is to ensure that the GP posterior variance, which is also the mean square error between the learned and actual fields, is below a predefined value. By exploiting the underlying properties of GP, we present a series of constant-factor approximation algorithms for minimizing the number of stationary sensors to place, minimizing the total time taken by a single robot, and minimizing the time taken by a team of robots to learn the field. Here, we assume that the GP hyperparameters are known. We then study a variant where our goal is to identify the hotspot in an environment. Here we do not assume that hyperparameters are unknown. For this problem, we present Upper Confidence Bound (UCB) and Monte Carlo Tree Search (MCTS) based algorithms for a single robot and later extend them to decentralized multi-robot teams. We also validate their performance on real-world datasets. For intrinsic learning, our aim is to reduce the number of physical interactions by leveraging simulations often known as Multi-Fidelity Reinforcement Learning (MFRL). In the MFRL framework, an agent uses multiple simulators of the real environment to perform actions. We present two MFRL framework versions, model-based and model-free, that leverage GPs to learn the optimal policy in a real-world environment. By incorporating GPs in the MFRL framework, we empirically observe a significant reduction in the number of samples for model-based and model-free learning

    A Practical Guide to Multi-Objective Reinforcement Learning and Planning

    Get PDF
    Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems

    A practical guide to multi-objective reinforcement learning and planning

    Get PDF
    Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)
    corecore