18 research outputs found
Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners
Successful teaching requires an assumption of how the learner learns - how
the learner uses experiences from the world to update their internal states. We
investigate what expectations people have about a learner when they teach them
in an online manner using rewards and punishment. We focus on a common
reinforcement learning method, Q-learning, and examine what assumptions people
have using a behavioral experiment. To do so, we first establish a normative
standard, by formulating the problem as a machine teaching optimization
problem. To solve the machine teaching optimization problem, we use a deep
learning approximation method which simulates learners in the environment and
learns to predict how feedback affects the learner's internal states. What do
people assume about a learner's learning and discount rates when they teach
them an idealized exploration-exploitation task? In a behavioral experiment, we
find that people can teach the task to Q-learners in a relatively efficient and
effective manner when the learner uses a small value for its discounting rate
and a large value for its learning rate. However, they still are suboptimal. We
also find that providing people with real-time updates of how possible feedback
would affect the Q-learner's internal states weakly helps them teach. Our
results reveal how people teach using evaluative feedback and provide guidance
for how engineers should design machine agents in a manner that is intuitive
for people.Comment: 21 pages, 4 figure
The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
We study the sample complexity of teaching, termed as "teaching dimension"
(TDim) in the literature, for the teaching-by-reinforcement paradigm, where the
teacher guides the student through rewards. This is distinct from the
teaching-by-demonstration paradigm motivated by robotics applications, where
the teacher teaches by providing demonstrations of state/action trajectories.
The teaching-by-reinforcement paradigm applies to a wider range of real-world
settings where a demonstration is inconvenient, but has not been studied
systematically. In this paper, we focus on a specific family of reinforcement
learning algorithms, Q-learning, and characterize the TDim under different
teachers with varying control power over the environment, and present matching
optimal teaching algorithms. Our TDim results provide the minimum number of
samples needed for reinforcement learning, and we discuss their connections to
standard PAC-style RL sample complexity and teaching-by-demonstration sample
complexity results. Our teaching algorithms have the potential to speed up RL
agent learning in applications where a helpful teacher is available