20 research outputs found

    Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions

    Get PDF
    An unpowered aerial glider learning to soar in a wind field presents a new manifestation of the exploration-exploitation trade-off. This thesis proposes a directed, adaptive and nonmyopic exploration strategy in a temporal difference reinforcement learning framework for tackling the resource-constrained exploration-exploitation task of this autonomous soaring problem. The complete learning algorithm is developed in a SARSA() framework, which uses a Gaussian process with a squared exponential covariance function to approximate the value function. The three key contributions of this thesis form the proposed exploration-exploitation strategy. Firstly, a new information measure is derived from the change in the variance volume surrounding the Gaussian process estimate. This measure of information gain is used to define the exploration reward of an observation. Secondly, a nonmyopic information value is presented that captures both the immediate exploration reward due to taking an action as well as future exploration opportunities that result. Finally, this information value is combined with the state-action value of SARSA() through a dynamic weighting factor to produce an exploration-exploitation management scheme for resource-constrained learning systems. The proposed learning strategy encourages either exploratory or exploitative behaviour depending on the requirements of the learning task and the available resources. The performance of the learning algorithms presented in this thesis is compared against other SARSA() methods. Results show that actively directing exploration to regions of the state-action space with high uncertainty improves the rate of learning, while dynamic management of the exploration-exploitation behaviour according to the available resources produces prudent learning behaviour in resource-constrained systems

    Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions

    Get PDF
    An unpowered aerial glider learning to soar in a wind field presents a new manifestation of the exploration-exploitation trade-off. This thesis proposes a directed, adaptive and nonmyopic exploration strategy in a temporal difference reinforcement learning framework for tackling the resource-constrained exploration-exploitation task of this autonomous soaring problem. The complete learning algorithm is developed in a SARSA() framework, which uses a Gaussian process with a squared exponential covariance function to approximate the value function. The three key contributions of this thesis form the proposed exploration-exploitation strategy. Firstly, a new information measure is derived from the change in the variance volume surrounding the Gaussian process estimate. This measure of information gain is used to define the exploration reward of an observation. Secondly, a nonmyopic information value is presented that captures both the immediate exploration reward due to taking an action as well as future exploration opportunities that result. Finally, this information value is combined with the state-action value of SARSA() through a dynamic weighting factor to produce an exploration-exploitation management scheme for resource-constrained learning systems. The proposed learning strategy encourages either exploratory or exploitative behaviour depending on the requirements of the learning task and the available resources. The performance of the learning algorithms presented in this thesis is compared against other SARSA() methods. Results show that actively directing exploration to regions of the state-action space with high uncertainty improves the rate of learning, while dynamic management of the exploration-exploitation behaviour according to the available resources produces prudent learning behaviour in resource-constrained systems

    Long-term Informative Path Planning with Autonomous Soaring

    Get PDF
    The ability of UAVs to cover large areas efficiently is valuable for information gathering missions. For long-term information gathering, a UAV may extend its endurance by accessing energy sources present in the atmosphere. Thermals are a favourable source of wind energy and thermal soaring is adopted in this thesis to enable long-term information gathering. This thesis proposes energy-constrained path planning algorithms for a gliding UAV to maximise information gain given a mission time that greatly exceeds the UAV's endurance. This thesis is motivated by the problem of probabilistic target-search performed by an energy-constrained UAV, which is tasked to simultaneously search for a lost ground target and explore for thermals to regain energy. This problem is termed informative soaring (IFS) and combines informative path planning (IPP) with energy constraints. IFS is shown to be NP-hard by showing that it has a similar problem structure to the weight-constrained shortest path problem with replenishments. While an optimal solution may not exist in polynomial time, this thesis proposes path planning algorithms based on informed tree search to find high quality plans with low computational cost. This thesis addresses complex probabilistic belief maps and three primary contributions are presented: • First, IFS is formulated as a graph search problem by observing that any feasible long-term plan must alternate between 1) information gathering between thermals and 2) replenishing energy within thermals. This is a first step to reducing the large search state space. • The second contribution is observing that a complex belief map can be viewed as a collection of information clusters and using a divide and conquer approach, cluster tree search (CTS), to efficiently find high-quality plans in the large search state space. In CTS, near-greedy tree search is used to find locally optimal plans and two global planning versions are proposed to combine local plans into a full plan. Monte Carlo simulation studies show that CTS produces similar plans to variations of exhaustive search, but runs five to 20 times faster. The more computationally efficient version, CTSDP, uses dynamic programming (DP) to optimally combine local plans. CTSDP is executed in real time on board a UAV to demonstrate computational feasibility. • The third contribution is an extension of CTS to unknown drifting thermals. A thermal exploration map is created to detect new thermals that will eventually intercept clusters, and therefore be valuable to the mission. Time windows are computed for known thermals and an optimal cluster visit schedule is formed. A tree search algorithm called CTSDrift combines CTS and thermal exploration. Using 2400 Monte Carlo simulations, CTSDrift is evaluated against a Full Knowledge method that has full knowledge of the thermal field and a Greedy method. On average, CTSDrift outperforms Greedy in one-third of trials, and achieves similar performance to Full Knowledge when environmental conditions are favourable

    Active End-Effector Pose Selection for Tactile Object Recognition through Monte Carlo Tree Search

    Full text link
    This paper considers the problem of active object recognition using touch only. The focus is on adaptively selecting a sequence of wrist poses that achieves accurate recognition by enclosure grasps. It seeks to minimize the number of touches and maximize recognition confidence. The actions are formulated as wrist poses relative to each other, making the algorithm independent of absolute workspace coordinates. The optimal sequence is approximated by Monte Carlo tree search. We demonstrate results in a physics engine and on a real robot. In the physics engine, most object instances were recognized in at most 16 grasps. On a real robot, our method recognized objects in 2--9 grasps and outperformed a greedy baseline.Comment: Accepted to International Conference on Intelligent Robots and Systems (IROS) 201

    Active End-Effector Pose Selection for Tactile Object Recognition through Monte Carlo Tree Search

    Full text link
    This paper considers the problem of active object recognition using touch only. The focus is on adaptively selecting a sequence of wrist poses that achieves accurate recognition by enclosure grasps. It seeks to minimize the number of touches and maximize recognition confidence. The actions are formulated as wrist poses relative to each other, making the algorithm independent of absolute workspace coordinates. The optimal sequence is approximated by Monte Carlo tree search. We demonstrate results in a physics engine and on a real robot. In the physics engine, most object instances were recognized in at most 16 grasps. On a real robot, our method recognized objects in 2--9 grasps and outperformed a greedy baseline.Comment: Accepted to International Conference on Intelligent Robots and Systems (IROS) 201

    Robotic Active Information Gathering for Spatial Field Reconstruction with Rapidly-Exploring Random Trees and Online Learning of Gaussian Processes

    Get PDF
    Information gathering (IG) algorithms aim to intelligently select a mobile sensor actions required to efficiently obtain an accurate reconstruction of a physical process, such as an occupancy map, or a magnetic field. Many recent works have proposed algorithms for IG that employ Gaussian processes (GPs) as underlying model of the process. However, most algorithms discretize the state space, which makes them computationally intractable for robotic systems with complex dynamics. Moreover, they are not suited for online information gathering tasks as they assume prior knowledge about GP parameters. This paper presents a novel approach that tackles the two aforementioned issues. Specifically, our approach includes two intertwined steps: (i) a Rapidly-Exploring Random Tree (RRT) search that allows a robot to identify unvisited locations, and to learn the GP parameters, and (ii) an RRT*-based informative path planning that guides the robot towards those locations by maximizing the information gathered while minimizing path cost. The combination of the two steps allows an online realization of the algorithm, while eliminating the need for discretization. We demonstrate that our proposed algorithm outperforms state-of-the-art both in simulations, and in a lab experiment in which a ground-based robot explores the magnetic field intensity within an indoor environment populated with obstacles

    Active Information Acquisition With Mobile Robots

    Get PDF
    The recent proliferation of sensors and robots has potential to transform fields as diverse as environmental monitoring, security and surveillance, localization and mapping, and structure inspection. One of the great technical challenges in these scenarios is to control the sensors and robots in order to extract accurate information about various physical phenomena autonomously. The goal of this dissertation is to provide a unified approach for active information acquisition with a team of sensing robots. We formulate a decision problem for maximizing relevant information measures, constrained by the motion capabilities and sensing modalities of the robots, and focus on the design of a scalable control strategy for the robot team. The first part of the dissertation studies the active information acquisition problem in the special case of linear Gaussian sensing and mobility models. We show that the classical principle of separation between estimation and control holds in this case. It enables us to reduce the original stochastic optimal control problem to a deterministic version and to provide an optimal centralized solution. Unfortunately, the complexity of obtaining the optimal solution scales exponentially with the length of the planning horizon and the number of robots. We develop approximation algorithms to manage the complexity in both of these factors and provide theoretical performance guarantees. Applications in gas concentration mapping, joint localization and vehicle tracking in sensor networks, and active multi-robot localization and mapping are presented. Coupled with linearization and model predictive control, our algorithms can even generate adaptive control policies for nonlinear sensing and mobility models. Linear Gaussian information seeking, however, cannot be applied directly in the presence of sensing nuisances such as missed detections, false alarms, and ambiguous data association or when some sensor observations are discrete (e.g., object classes, medical alarms) or, even worse, when the sensing and target models are entirely unknown. The second part of the dissertation considers these complications in the context of two applications: active localization from semantic observations (e.g, recognized objects) and radio signal source seeking. The complexity of the target inference problem forces us to resort to greedy planning of the sensor trajectories. Non-greedy closed-loop information acquisition with general discrete models is achieved in the final part of the dissertation via dynamic programming and Monte Carlo tree search algorithms. Applications in active object recognition and pose estimation are presented. The techniques developed in this thesis offer an effective and scalable approach for controlled information acquisition with multiple sensing robots and have broad applications to environmental monitoring, search and rescue, security and surveillance, localization and mapping, precision agriculture, and structure inspection
    corecore