Search CORE

177,792 research outputs found

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Author: Cevher Volkan
Devidze Rati
Kamalaruban Parameswaran
Singla Adish
Publication venue
Publication date: 01/01/2019
Field of study

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.Comment: IJCAI'19 paper (extended version

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

MPG.PuRe

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Author: Haug Luis
Singla Adish
Tschiatschek Sebastian
Publication venue
Publication date: 01/01/2018
Field of study

Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy.Comment: NeurIPS'2018 (extended version

arXiv.org e-Print Archive

MPG.PuRe

Rage Against the Machines: How Subjects Learn to Play Against Computers

Author: Albert Kolb
Burkhard C. Schipper
Jörg Oechssler
Peter Dürsch
Publication venue
Publication date
Field of study

We use an experiment to explore how subjects learn to play against computers which are programmed to follow one of a number of standard learning algorithms. The learning theories are (unbeknown to subjects) a best response process, fictitious play, imitation, reinforcement learning, and a trial & error process. We test whether subjects try to influence those algorithms to their advantage in a forward-looking way (strategic teaching). We find that strategic teaching occurs frequently and that all learning algorithms are subject to exploitation with the notable exception of imitation. The experiment was conducted, both, on the internet and in the usual laboratory setting. We find some systematic differences, which however can be traced to the different incentives structures rather than the experimental environment.learning; fictitious play; imitation; reinforcement; trial & error; strategic teaching; Cournot duopoly; experiments; internet.

Research Papers in Economics

Rage Against the Machines: How Subjects Learn to Play Against Computers

Author: Albert Kolb
Burkhard C. Schipper
Jorg Oechssler
Publication venue
Publication date
Field of study

We use an experiment to explore how subjects learn to play against computers which are programmed to follow one of a number of standard learning algorithms. The learning theories are (unbeknown to subjects) a best response process, fictitious play, imitation, reinforcement learning, and a trial & error process. We test whether subjects try to influence those algorithms to their advantage in a forward-looking way (strategic teaching). We find that strategic teaching occurs frequently and that all learning algorithms are subject to exploitation with the notable exception of imitation. The experiment was conducted, both, on the internet and in the usual laboratory setting. We find some systematic differences, which however can be traced to the different incentives structures rather than the experimental environment.learning, fictitious play, imitation, reinforcement, trial & error, strategic teaching, Cournot duopoly, experiments, internet

Research Papers in Economics

Rage Against the Machines: How Subjects Learn to Play Against Computers

Author: Albert Kolb
Burkhard Schipper
Joerg Oechssler
Peter Duersch
Publication venue
Publication date
Field of study

Research Papers in Economics

AI Education: Machine Learning Resources

Author: Neller Todd W.
Publication venue: The Cupola: Scholarship at Gettysburg College
Publication date: 01/01/2017
Field of study

In this column, we focus on resources for learning and teaching three broad categories of machine learning (ML): supervised, unsupervised, and reinforcement learning. In ournext column, we will focus specifically on deep neural network learning resources, so if you have any resource recommendations, please email them to the address above. [excerpt

Gettysburg College

The Sample Complexity of Teaching-by-Reinforcement on Q-Learning

Author: Bharti Shubham Kumar
Ma Yuzhe
Singla Adish
Zhang Xuezhou
Zhu Xiaojin
Publication venue
Publication date: 01/01/2021
Field of study

We study the sample complexity of teaching, termed as "teaching dimension" (TDim) in the literature, for the teaching-by-reinforcement paradigm, where the teacher guides the student through rewards. This is distinct from the teaching-by-demonstration paradigm motivated by robotics applications, where the teacher teaches by providing demonstrations of state/action trajectories. The teaching-by-reinforcement paradigm applies to a wider range of real-world settings where a demonstration is inconvenient, but has not been studied systematically. In this paper, we focus on a specific family of reinforcement learning algorithms, Q-learning, and characterize the TDim under different teachers with varying control power over the environment, and present matching optimal teaching algorithms. Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results. Our teaching algorithms have the potential to speed up RL agent learning in applications where a helpful teacher is available

arXiv.org e-Print Archive

MPG.PuRe

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision

Author: Kepec A.
Keramati M.
Lak A.
Nomoto K.
Sakagami M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect a prediction was (“reward prediction error,” RPE). In several reinforcement learning contexts, such as Pavlovian conditioning and decisions guided by reward history, this RPE signal is provided by midbrain dopamine neurons. In many situations, however, the stimuli predictive of outcomes are perceptually ambiguous. Perceptual uncertainty is known to influence choices, but it has been unclear whether or how dopamine neurons factor it into their teaching signal. To cope with uncertainty, we extended a reinforcement learning model with a belief state about the perceptually ambiguous stimulus; this model generates an estimate of the probability of choice correctness, termed decision confidence. We show that dopamine responses in monkeys performing a perceptually ambiguous decision task comply with the model’s predictions. Consequently, dopamine responses did not simply reflect a stimulus’ average expected reward value but were predictive of the trial-to-trial fluctuations in perceptual accuracy. These confidence-dependent dopamine responses emerged prior to monkeys’ choice initiation, raising the possibility that dopamine impacts impending decisions, in addition to encoding a post-decision teaching signal. Finally, by manipulating reward size, we found that dopamine neurons reflect both the upcoming reward size and the confidence in achieving it. Together, our results show that dopamine responses convey teaching signals that are also appropriate for perceptual decisions

City Research Online

Cold Spring Harbor Laboratory Institutional Repository

UCL Discovery

MPG.PuRe

COMPARING SELF-DELIVERED TO INSTRUCTOR-DELIVERED REINFORCEMENT DURING VOCATIONAL INSTRUCTION FOR STUDENTS WITH INTELLECTUAL DISABILITY USING VIDEO ACTIVITY SCHEDULES

Author: White Katherine Elise
Publication venue: UKnowledge
Publication date: 01/01/2022
Field of study

In this study, an adapted alternating treatments design was used to compare the effectiveness of teaching vocational task when using self-delivered reinforcement versus instructor-delivered reinforcement while using video prompting. Participants consisted of four high school students who had been diagnosed with intellectual disabilities. Results indicated that instructor delivered reinforcement was slightly more effective at teaching a vocational task for 2 of the 4 participants. The results of the other 2 participants indicated that both forms of reinforcement delivery were similarly effective

University of Kentucky