76 research outputs found

    Accelerating decision making under partial observability using learned action priors

    Get PDF
    Thesis (M.Sc.)--University of the Witwatersrand, Faculty of Science, School of Computer Science and Applied Mathematics, 2017.Partially Observable Markov Decision Processes (POMDPs) provide a principled mathematical framework allowing a robot to reason about the consequences of actions and observations with respect to the agent's limited perception of its environment. They allow an agent to plan and act optimally in uncertain environments. Although they have been successfully applied to various robotic tasks, they are infamous for their high computational cost. This thesis demonstrates the use of knowledge transfer, learned from previous experiences, to accelerate the learning of POMDP tasks. We propose that in order for an agent to learn to solve these tasks quicker, it must be able to generalise from past behaviours and transfer knowledge, learned from solving multiple tasks, between di erent circumstances. We present a method for accelerating this learning process by learning the statistics of action choices over the lifetime of an agent, known as action priors. Action priors specify the usefulness of actions in situations and allow us to bias exploration, which in turn improves the performance of the learning process. Using navigation domains, we study the degree to which transferring knowledge between tasks in this way results in a considerable speed up in solution times. This thesis therefore makes the following contributions. We provide an algorithm for learning action priors from a set of approximately optimal value functions and two approaches with which a prior knowledge over actions can be used in a POMDP context. As such, we show that considerable gains in speed can be achieved in learning subsequent tasks using prior knowledge rather than learning from scratch. Learning with action priors can particularly be useful in reducing the cost of exploration in the early stages of the learning process as the priors can act as mechanism that allows the agent to select more useful actions given particular circumstances. Thus, we demonstrate how the initial losses associated with unguided exploration can be alleviated through the use of action priors which allow for safer exploration. Additionally, we illustrate that action priors can also improve the computation speeds of learning feasible policies in a shorter period of time.MT201

    Decision shaping and strategy learning in multi-robot interactions

    Get PDF
    Recent developments in robot technology have contributed to the advancement of autonomous behaviours in human-robot systems; for example, in following instructions received from an interacting human partner. Nevertheless, increasingly many systems are moving towards more seamless forms of interaction, where factors such as implicit trust and persuasion between humans and robots are brought to the fore. In this context, the problem of attaining, through suitable computational models and algorithms, more complex strategic behaviours that can influence human decisions and actions during an interaction, remains largely open. To address this issue, this thesis introduces the problem of decision shaping in strategic interactions between humans and robots, where a robot seeks to lead, without however forcing, an interacting human partner to a particular state. Our approach to this problem is based on a combination of statistical modeling and synthesis of demonstrated behaviours, which enables robots to efficiently adapt to novel interacting agents. We primarily focus on interactions between autonomous and teleoperated (i.e. human-controlled) NAO humanoid robots, using the adversarial soccer penalty shooting game as an illustrative example. We begin by describing the various challenges that a robot operating in such complex interactive environments is likely to face. Then, we introduce a procedure through which composable strategy templates can be learned from provided human demonstrations of interactive behaviours. We subsequently present our primary contribution to the shaping problem, a Bayesian learning framework that empirically models and predicts the responses of an interacting agent, and computes action strategies that are likely to influence that agent towards a desired goal. We then address the related issue of factors affecting human decisions in these interactive strategic environments, such as the availability of perceptual information for the human operator. Finally, we describe an information processing algorithm, based on the Orient motion capture platform, which serves to facilitate direct (as opposed to teleoperation-mediated) strategic interactions between humans and robots. Our experiments introduce and evaluate a wide range of novel autonomous behaviours, where robots are shown to (learn to) influence a variety of interacting agents, ranging from other simple autonomous agents, to robots controlled by experienced human subjects. These results demonstrate the benefits of strategic reasoning in human-robot interaction, and constitute an important step towards realistic, practical applications, where robots are expected to be not just passive agents, but active, influencing participants

    Mapping, planning and exploration with Pose SLAM

    Get PDF
    This thesis reports research on mapping, path planning, and autonomous exploration. These are classical problems in robotics, typically studied independently, and here we link such problems by framing them within a common SLAM approach, adopting Pose SLAM as the basic state estimation machinery. The main contribution of this thesis is an approach that allows a mobile robot to plan a path using the map it builds with Pose SLAM and to select the appropriate actions to autonomously construct this map. Pose SLAM is the variant of SLAM where only the robot trajectory is estimated and where landmarks are only used to produce relative constraints between robot poses. In Pose SLAM, observations come in the form of relative-motion measurements between robot poses. With regards to extending the original Pose SLAM formulation, this thesis studies the computation of such measurements when they are obtained with stereo cameras and develops the appropriate noise propagation models for such case. Furthermore, the initial formulation of Pose SLAM assumes poses in SE(2) and in this thesis we extend this formulation to SE(3), parameterizing rotations either with Euler angles and quaternions. We also introduce a loop closure test that exploits the information from the filter using an independent measure of information content between poses. In the application domain, we present a technique to process the 3D volumetric maps obtained with this SLAM methodology, but with laser range scanning as the sensor modality, to derive traversability maps. Aside from these extensions to Pose SLAM, the core contribution of the thesis is an approach for path planning that exploits the modeled uncertainties in Pose SLAM to search for the path in the pose graph with the lowest accumulated robot pose uncertainty, i.e., the path that allows the robot to navigate to a given goal with the least probability of becoming lost. An added advantage of the proposed path planning approach is that since Pose SLAM is agnostic with respect to the sensor modalities used, it can be used in different environments and with different robots, and since the original pose graph may come from a previous mapping session, the paths stored in the map already satisfy constraints not easy modeled in the robot controller, such as the existence of restricted regions, or the right of way along paths. The proposed path planning methodology has been extensively tested both in simulation and with a real outdoor robot. Our path planning approach is adequate for scenarios where a robot is initially guided during map construction, but autonomous during execution. For other scenarios in which more autonomy is required, the robot should be able to explore the environment without any supervision. The second core contribution of this thesis is an autonomous exploration method that complements the aforementioned path planning strategy. The method selects the appropriate actions to drive the robot so as to maximize coverage and at the same time minimize localization and map uncertainties. An occupancy grid is maintained for the sole purpose of guaranteeing coverage. A significant advantage of the method is that since the grid is only computed to hypothesize entropy reduction of candidate map posteriors, it can be computed at a very coarse resolution since it is not used to maintain neither the robot localization estimate, nor the structure of the environment. Our technique evaluates two types of actions: exploratory actions and place revisiting actions. Action decisions are made based on entropy reduction estimates. By maintaining a Pose SLAM estimate at run time, the technique allows to replan trajectories online should significant change in the Pose SLAM estimate be detected. The proposed exploration strategy was tested in a common publicly available dataset comparing favorably against frontier based exploratio

    Integrated robot task and motion planning in belief space

    Get PDF
    In this paper, we describe an integrated strategy for planning, perception, state-estimation and action in complex mobile manipulation domains. The strategy is based on planning in the belief space of probability distribution over states. Our planning approach is based on hierarchical goal regression (pre-image back-chaining). We develop a vocabulary of fluents that describe sets of belief states, which are goals and subgoals in the planning process. We show that a relatively small set of symbolic operators lead to task-oriented perception in support of the manipulation goals. An implementation of this method is demonstrated in simulation and on a real PR2 robot, showing robust, flexible solution of mobile manipulation problems with multiple objects and substantial uncertainty.This work was supported in part by the NSF under Grant No. 1117325. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We also gratefully acknowledge support from ONR MURI grant N00014-09-1-1051, from AFOSR grant AOARD-104135 and from the Singapore Ministry of Education under a grant to the Singapore-MIT International Design Center. We thank Willow Garage for the use of the PR2 robot as part of the PR2 Beta Program

    On knowledge representation and decision making under uncertainty

    Get PDF
    Designing systems with the ability to make optimal decisions under uncertainty is one of the goals of artificial intelligence. However, in many applications the design of optimal planners is complicated due to imprecise inputs and uncertain outputs resulting from stochastic dynamics. Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical framework to model these kinds of problems. However, the high computational demand of solution methods for POMDPs is a drawback for applying them in practice.In this thesis, we present a two-fold approach for improving the tractability of POMDP planning. First, we focus on designing good heuristics for POMDP approximation algorithms. We aim to scale up the efficiency of a class of POMDP approximations called point-based planning methods by designing a good planning space. We study the effect of three properties of reachable belief state points that may influence the performance of point-based approximation methods. Second, we investigate approaches to designing good controllers using an alternative representation of systems with partial observability called Predictive State Representation (PSR). This part of the thesis advocates the usefulness and practicality of PSRs in planning under uncertainty. We also attempt to move some useful characteristics of the PSR model, which has a predictive view of the world, to the POMDP model, which has a probabilistic view of the hidden states of the world. We propose a planning algorithm motivated by the connections between the two models

    Learning in a State of Confusion: Employing active perception and reinforcement learning in partially observable worlds

    Get PDF
    Institute of Perception, Action and BehaviourIn applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov model of the world and how to act have proved very brittle. Our focus differs. In considering embodied, embedded and situated agents we have a preference for simple learning algorithms which reliably learn satisficing policies. The learning algorithms we consider do not try to uncover the underlying Markovian states, instead they aim to learn successful deterministic reactive policies such that agents actions are based directly upon the observations provided by their sensors. Existing results have shown that such reactive policies can be arbitrarily worse than a policy that has access to the underlying Markov process and in some cases no satisficing reactive policy can exist. Our first contribution is to show that providing agents with alternative actions and viewpoints on the task through the addition of active perception can provide a practical solution in such circumstances. We demonstrate empirically that: (i) adding arbitrary active perception actions to agents which can only learn deterministic reactive policies can allow the learning of satisficing policies where none were originally possible; (ii) active perception actions allow the learning of better satisficing policies than those that existed previously and (iii) our approach converges more reliably to satisficing solutions than existing state estimation algorithms such as U-Tree and the Lion Algorithm. Our other contributions focus on issues which affect the reliability with which deterministic reactive satisficing policies can be learnt in non-Markovian environments. We show that that greedy action selection may be a necessary condition for the existence of stable deterministic reactive policies on partially observable Markov decision processes (POMDPs). We also set out the concept of Consistent Exploration. This is the idea of estimating state-action values by acting as though the policy has been changed to incorporate the action being explored. We demonstrate that this concept can be used to develop better algorithms for learning reactive policies to POMDPs by presenting a new reinforcement learning algorithm; the Consistent Exploration Q(l) algorithm (CEQ(l)). We demonstrate on a significant number of problems that CEQ(l) is more reliable at learning satisficing solutions than the algorithm currently regarded as the best for learning deterministic reactive policies, that of SARSA(l)

    Mapping, planning and exploration with Pose SLAM

    Get PDF
    This thesis reports research on mapping, path planning, and autonomous exploration. These are classical problems in robotics, typically studied independently, and here we link such problems by framing them within a common SLAM approach, adopting Pose SLAM as the basic state estimation machinery. The main contribution of this thesis is an approach that allows a mobile robot to plan a path using the map it builds with Pose SLAM and to select the appropriate actions to autonomously construct this map. Pose SLAM is the variant of SLAM where only the robot trajectory is estimated and where landmarks are only used to produce relative constraints between robot poses. In Pose SLAM, observations come in the form of relative-motion measurements between robot poses. With regards to extending the original Pose SLAM formulation, this thesis studies the computation of such measurements when they are obtained with stereo cameras and develops the appropriate noise propagation models for such case. Furthermore, the initial formulation of Pose SLAM assumes poses in SE(2) and in this thesis we extend this formulation to SE(3), parameterizing rotations either with Euler angles and quaternions. We also introduce a loop closure test that exploits the information from the filter using an independent measure of information content between poses. In the application domain, we present a technique to process the 3D volumetric maps obtained with this SLAM methodology, but with laser range scanning as the sensor modality, to derive traversability maps. Aside from these extensions to Pose SLAM, the core contribution of the thesis is an approach for path planning that exploits the modeled uncertainties in Pose SLAM to search for the path in the pose graph with the lowest accumulated robot pose uncertainty, i.e., the path that allows the robot to navigate to a given goal with the least probability of becoming lost. An added advantage of the proposed path planning approach is that since Pose SLAM is agnostic with respect to the sensor modalities used, it can be used in different environments and with different robots, and since the original pose graph may come from a previous mapping session, the paths stored in the map already satisfy constraints not easy modeled in the robot controller, such as the existence of restricted regions, or the right of way along paths. The proposed path planning methodology has been extensively tested both in simulation and with a real outdoor robot. Our path planning approach is adequate for scenarios where a robot is initially guided during map construction, but autonomous during execution. For other scenarios in which more autonomy is required, the robot should be able to explore the environment without any supervision. The second core contribution of this thesis is an autonomous exploration method that complements the aforementioned path planning strategy. The method selects the appropriate actions to drive the robot so as to maximize coverage and at the same time minimize localization and map uncertainties. An occupancy grid is maintained for the sole purpose of guaranteeing coverage. A significant advantage of the method is that since the grid is only computed to hypothesize entropy reduction of candidate map posteriors, it can be computed at a very coarse resolution since it is not used to maintain neither the robot localization estimate, nor the structure of the environment. Our technique evaluates two types of actions: exploratory actions and place revisiting actions. Action decisions are made based on entropy reduction estimates. By maintaining a Pose SLAM estimate at run time, the technique allows to replan trajectories online should significant change in the Pose SLAM estimate be detected. The proposed exploration strategy was tested in a common publicly available dataset comparing favorably against frontier based explorationPostprint (published version

    Advances in Reinforcement Learning

    Get PDF
    Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic
    corecore