9 research outputs found
Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners
Successful teaching requires an assumption of how the learner learns - how
the learner uses experiences from the world to update their internal states. We
investigate what expectations people have about a learner when they teach them
in an online manner using rewards and punishment. We focus on a common
reinforcement learning method, Q-learning, and examine what assumptions people
have using a behavioral experiment. To do so, we first establish a normative
standard, by formulating the problem as a machine teaching optimization
problem. To solve the machine teaching optimization problem, we use a deep
learning approximation method which simulates learners in the environment and
learns to predict how feedback affects the learner's internal states. What do
people assume about a learner's learning and discount rates when they teach
them an idealized exploration-exploitation task? In a behavioral experiment, we
find that people can teach the task to Q-learners in a relatively efficient and
effective manner when the learner uses a small value for its discounting rate
and a large value for its learning rate. However, they still are suboptimal. We
also find that providing people with real-time updates of how possible feedback
would affect the Q-learner's internal states weakly helps them teach. Our
results reveal how people teach using evaluative feedback and provide guidance
for how engineers should design machine agents in a manner that is intuitive
for people.Comment: 21 pages, 4 figure
Optimal Attack against Autoregressive Models by Manipulating the Environment
We describe an optimal adversarial attack formulation against autoregressive
time series forecast using Linear Quadratic Regulator (LQR). In this threat
model, the environment evolves according to a dynamical system; an
autoregressive model observes the current environment state and predicts its
future values; an attacker has the ability to modify the environment state in
order to manipulate future autoregressive forecasts. The attacker's goal is to
force autoregressive forecasts into tracking a target trajectory while
minimizing its attack expenditure. In the white-box setting where the attacker
knows the environment and forecast models, we present the optimal attack using
LQR for linear models, and Model Predictive Control (MPC) for nonlinear models.
In the black-box setting, we combine system identification and MPC. Experiments
demonstrate the effectiveness of our attacks
The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
We study the sample complexity of teaching, termed as "teaching dimension"
(TDim) in the literature, for the teaching-by-reinforcement paradigm, where the
teacher guides the student through rewards. This is distinct from the
teaching-by-demonstration paradigm motivated by robotics applications, where
the teacher teaches by providing demonstrations of state/action trajectories.
The teaching-by-reinforcement paradigm applies to a wider range of real-world
settings where a demonstration is inconvenient, but has not been studied
systematically. In this paper, we focus on a specific family of reinforcement
learning algorithms, Q-learning, and characterize the TDim under different
teachers with varying control power over the environment, and present matching
optimal teaching algorithms. Our TDim results provide the minimum number of
samples needed for reinforcement learning, and we discuss their connections to
standard PAC-style RL sample complexity and teaching-by-demonstration sample
complexity results. Our teaching algorithms have the potential to speed up RL
agent learning in applications where a helpful teacher is available