3,614 research outputs found

    Human Apprenticeship Learning via Kernel-based Inverse Reinforcement Learning

    Full text link
    It has been well demonstrated that inverse reinforcement learning (IRL) is an effective technique for teaching machines to perform tasks at human skill levels given human demonstrations (i.e., human to machine apprenticeship learning). This paper seeks to show that a similar application can be demonstrated with human learners. That is, given demonstrations from human experts inverse reinforcement learning techniques can be used to teach other humans to perform at higher skill levels (i.e., human to human apprenticeship learning). To show this two experiments were conducted using a simple, real-time web game where players were asked to touch targets in order to earn as many points as possible. For the experiment player performance was defined as the number of targets a player touched, irrespective of the points that a player actually earned. This allowed for in-game points to be modified and the effect of these alterations on performance measured. At no time were participants told the true performance metric. To determine the point modifications IRL was applied on demonstrations of human experts playing the game. The results of the experiment show with significance that performance improved over the control for select treatment groups. Finally, in addition to the experiment, we also detail the algorithmic challenges we faced when conducting the experiment and the techniques we used to overcome them.Comment: 31 pages, 23 figures, Submitted to Journal of Artificial Intelligence Research, "for source code, see https://github.com/mrucker/kpirl-kla

    Using machine learning to learn from demonstration: application to the AR.Drone quadrotor control

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. December 14, 2015Developing a robot that can operate autonomously is an active area in robotics research. An autonomously operating robot can have a tremendous number of applications such as: surveillance and inspection; search and rescue; and operating in hazardous environments. Reinforcement learning, a branch of machine learning, provides an attractive framework for developing robust control algorithms since it is less demanding in terms of both knowledge and programming effort. Given a reward function, reinforcement learning employs a trial-and-error concept to make an agent learn. It is computationally intractable in practice for an agent to learn “de novo”, thus it is important to provide the learning system with “a priori” knowledge. Such prior knowledge would be in the form of demonstrations performed by the teacher. However, prior knowledge does not necessarily guarantee that the agent will perform well. The performance of the agent usually depends on the reward function, since the reward function describes the formal specification of the control task. However, problems arise with complex reward function that are difficult to specify manually. In order to address these problems, apprenticeship learning via inverse reinforcement learning is used. Apprenticeship learning via inverse reinforcement learning can be used to extract a reward function from the set of demonstrations so that the agent can optimise its performance with respect to that reward function. In this research, a flight controller for the Ar.Drone quadrotor was created using a reinforcement learning algorithm and function approximators with some prior knowledge. The agent was able to perform a manoeuvre that is similar to the one demonstrated by the teacher

    The use of apprenticeship learning via inverse reinforcement learning for musical composition

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. 14 August 2014.Reinforcement learning is a branch of machine learning wherein an agent is given rewards which it uses to guide its learning. These rewards are often manually specified in terms of a reward function. The agent performs a certain action in a particular state and is given a reward accordingly (where a state is a configuration of the environment). A problem arises when the reward function is either difficult or impossible to manually specify. Apprenticeship learning via inverse reinforcement learning can be used in these cases in order to ascertain a reward function, given a set of expert trajectories. The research presented in this document used apprenticeship learning in order to ascertain a reward function in a musical context. The agent then optimized its performance in terms of this reward function. This was accomplished by presenting the learning agents with pieces of music composed by the author. These were the expert trajectories from which the learning agent discovered a reward function. This reward function allowed the agents to attempt to discover an optimal strategy for maximizing its value. Three learning agents were created. Two were drum-beat generating agents and one a melody composing agent. The first two agents were used to recreate expert drum-beats as well as generate new drum-beats. The melody agent was used to generate new melodies given a set of expert melodies. The results show that apprenticeship learning can be used both to recreate expert musical pieces as well as generate new musical pieces which are similar to with the expert musical pieces. Further, the results using the melody agent indicate that the agent has learned to generate new melodies in a given key, without having been given explicit information about key signatures

    Joint Path planning and Power Allocation of a Cellular-Connected UAV using Apprenticeship Learning via Deep Inverse Reinforcement Learning

    Full text link
    This paper investigates an interference-aware joint path planning and power allocation mechanism for a cellular-connected unmanned aerial vehicle (UAV) in a sparse suburban environment. The UAV's goal is to fly from an initial point and reach a destination point by moving along the cells to guarantee the required quality of service (QoS). In particular, the UAV aims to maximize its uplink throughput and minimize the level of interference to the ground user equipment (UEs) connected to the neighbor cellular BSs, considering the shortest path and flight resource limitation. Expert knowledge is used to experience the scenario and define the desired behavior for the sake of the agent (i.e., UAV) training. To solve the problem, an apprenticeship learning method is utilized via inverse reinforcement learning (IRL) based on both Q-learning and deep reinforcement learning (DRL). The performance of this method is compared to learning from a demonstration technique called behavioral cloning (BC) using a supervised learning approach. Simulation and numerical results show that the proposed approach can achieve expert-level performance. We also demonstrate that, unlike the BC technique, the performance of our proposed approach does not degrade in unseen situations

    Safety-Aware Apprenticeship Learning

    Full text link
    Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.Comment: Accepted by International Conference on Computer Aided Verification (CAV) 201

    Human-Machine Collaborative Optimization via Apprenticeship Scheduling

    Full text link
    Coordinating agents to complete a set of tasks with intercoupled temporal and resource constraints is computationally challenging, yet human domain experts can solve these difficult scheduling problems using paradigms learned through years of apprenticeship. A process for manually codifying this domain knowledge within a computational framework is necessary to scale beyond the ``single-expert, single-trainee" apprenticeship model. However, human domain experts often have difficulty describing their decision-making processes, causing the codification of this knowledge to become laborious. We propose a new approach for capturing domain-expert heuristics through a pairwise ranking formulation. Our approach is model-free and does not require enumerating or iterating through a large state space. We empirically demonstrate that this approach accurately learns multifaceted heuristics on a synthetic data set incorporating job-shop scheduling and vehicle routing problems, as well as on two real-world data sets consisting of demonstrations of experts solving a weapon-to-target assignment problem and a hospital resource allocation problem. We also demonstrate that policies learned from human scheduling demonstration via apprenticeship learning can substantially improve the efficiency of a branch-and-bound search for an optimal schedule. We employ this human-machine collaborative optimization technique on a variant of the weapon-to-target assignment problem. We demonstrate that this technique generates solutions substantially superior to those produced by human domain experts at a rate up to 9.5 times faster than an optimization approach and can be applied to optimally solve problems twice as complex as those solved by a human demonstrator.Comment: Portions of this paper were published in the Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) in 2016 and in the Proceedings of Robotics: Science and Systems (RSS) in 2016. The paper consists of 50 pages with 11 figures and 4 table
    corecore