177,792 research outputs found

    Interactive Teaching Algorithms for Inverse Reinforcement Learning

    Full text link
    We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.Comment: IJCAI'19 paper (extended version

    Teaching Inverse Reinforcement Learners via Features and Demonstrations

    Get PDF
    Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy.Comment: NeurIPS'2018 (extended version

    Rage Against the Machines: How Subjects Learn to Play Against Computers

    Get PDF
    We use an experiment to explore how subjects learn to play against computers which are programmed to follow one of a number of standard learning algorithms. The learning theories are (unbeknown to subjects) a best response process, fictitious play, imitation, reinforcement learning, and a trial & error process. We test whether subjects try to influence those algorithms to their advantage in a forward-looking way (strategic teaching). We find that strategic teaching occurs frequently and that all learning algorithms are subject to exploitation with the notable exception of imitation. The experiment was conducted, both, on the internet and in the usual laboratory setting. We find some systematic differences, which however can be traced to the different incentives structures rather than the experimental environment.learning; fictitious play; imitation; reinforcement; trial & error; strategic teaching; Cournot duopoly; experiments; internet.

    Rage Against the Machines: How Subjects Learn to Play Against Computers

    Get PDF
    We use an experiment to explore how subjects learn to play against computers which are programmed to follow one of a number of standard learning algorithms. The learning theories are (unbeknown to subjects) a best response process, fictitious play, imitation, reinforcement learning, and a trial & error process. We test whether subjects try to influence those algorithms to their advantage in a forward-looking way (strategic teaching). We find that strategic teaching occurs frequently and that all learning algorithms are subject to exploitation with the notable exception of imitation. The experiment was conducted, both, on the internet and in the usual laboratory setting. We find some systematic differences, which however can be traced to the different incentives structures rather than the experimental environment.learning, fictitious play, imitation, reinforcement, trial & error, strategic teaching, Cournot duopoly, experiments, internet

    Rage Against the Machines: How Subjects Learn to Play Against Computers

    Get PDF
    We use an experiment to explore how subjects learn to play against computers which are programmed to follow one of a number of standard learning algorithms. The learning theories are (unbeknown to subjects) a best response process, fictitious play, imitation, reinforcement learning, and a trial & error process. We test whether subjects try to influence those algorithms to their advantage in a forward-looking way (strategic teaching). We find that strategic teaching occurs frequently and that all learning algorithms are subject to exploitation with the notable exception of imitation. The experiment was conducted, both, on the internet and in the usual laboratory setting. We find some systematic differences, which however can be traced to the different incentives structures rather than the experimental environment.learning; fictitious play; imitation; reinforcement; trial & error; strategic teaching; Cournot duopoly; experiments; internet.

    AI Education: Machine Learning Resources

    Full text link
    In this column, we focus on resources for learning and teaching three broad categories of machine learning (ML): supervised, unsupervised, and reinforcement learning. In ournext column, we will focus specifically on deep neural network learning resources, so if you have any resource recommendations, please email them to the address above. [excerpt

    The Sample Complexity of Teaching-by-Reinforcement on Q-Learning

    Get PDF
    We study the sample complexity of teaching, termed as "teaching dimension" (TDim) in the literature, for the teaching-by-reinforcement paradigm, where the teacher guides the student through rewards. This is distinct from the teaching-by-demonstration paradigm motivated by robotics applications, where the teacher teaches by providing demonstrations of state/action trajectories. The teaching-by-reinforcement paradigm applies to a wider range of real-world settings where a demonstration is inconvenient, but has not been studied systematically. In this paper, we focus on a specific family of reinforcement learning algorithms, Q-learning, and characterize the TDim under different teachers with varying control power over the environment, and present matching optimal teaching algorithms. Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results. Our teaching algorithms have the potential to speed up RL agent learning in applications where a helpful teacher is available

    COMPARING SELF-DELIVERED TO INSTRUCTOR-DELIVERED REINFORCEMENT DURING VOCATIONAL INSTRUCTION FOR STUDENTS WITH INTELLECTUAL DISABILITY USING VIDEO ACTIVITY SCHEDULES

    Get PDF
    In this study, an adapted alternating treatments design was used to compare the effectiveness of teaching vocational task when using self-delivered reinforcement versus instructor-delivered reinforcement while using video prompting. Participants consisted of four high school students who had been diagnosed with intellectual disabilities. Results indicated that instructor delivered reinforcement was slightly more effective at teaching a vocational task for 2 of the 4 participants. The results of the other 2 participants indicated that both forms of reinforcement delivery were similarly effective
    • …
    corecore