67,523 research outputs found
Improved Reinforcement Learning with Curriculum
Humans tend to learn complex abstract concepts faster if examples are
presented in a structured manner. For instance, when learning how to play a
board game, usually one of the first concepts learned is how the game ends,
i.e. the actions that lead to a terminal state (win, lose or draw). The
advantage of learning end-games first is that once the actions which lead to a
terminal state are understood, it becomes possible to incrementally learn the
consequences of actions that are further away from a terminal state - we call
this an end-game-first curriculum. Currently the state-of-the-art machine
learning player for general board games, AlphaZero by Google DeepMind, does not
employ a structured training curriculum; instead learning from the entire game
at all times. By employing an end-game-first training curriculum to train an
AlphaZero inspired player, we empirically show that the rate of learning of an
artificial player can be improved during the early stages of training when
compared to a player not using a training curriculum.Comment: Draft prior to submission to IEEE Trans on Games. Changed paper
slightl
Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning
Dialogue policy learning based on reinforcement learning is difficult to be
applied to real users to train dialogue agents from scratch because of the high
cost. User simulators, which choose random user goals for the dialogue agent to
train on, have been considered as an affordable substitute for real users.
However, this random sampling method ignores the law of human learning, making
the learned dialogue policy inefficient and unstable. We propose a novel
framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), which
replaces the traditional random sampling method with a teacher policy model to
realize the dialogue policy for automatic curriculum learning. The teacher
model arranges a meaningful ordered curriculum and automatically adjusts it by
monitoring the learning progress of the dialogue agent and the over-repetition
penalty without any requirement of prior knowledge. The learning progress of
the dialogue agent reflects the relationship between the dialogue agent's
ability and the sampled goals' difficulty for sample efficiency. The
over-repetition penalty guarantees the sampled diversity. Experiments show that
the ACL-DQN significantly improves the effectiveness and stability of dialogue
tasks with a statistically significant margin. Furthermore, the framework can
be further improved by equipping with different curriculum schedules, which
demonstrates that the framework has strong generalizability
Guided Curriculum Learning for Walking Over Complex Terrain
Reliable bipedal walking over complex terrain is a challenging problem, using
a curriculum can help learning. Curriculum learning is the idea of starting
with an achievable version of a task and increasing the difficulty as a success
criteria is met. We propose a 3-stage curriculum to train Deep Reinforcement
Learning policies for bipedal walking over various challenging terrains. In the
first stage, the agent starts on an easy terrain and the terrain difficulty is
gradually increased, while forces derived from a target policy are applied to
the robot joints and the base. In the second stage, the guiding forces are
gradually reduced to zero. Finally, in the third stage, random perturbations
with increasing magnitude are applied to the robot base, so the robustness of
the policies are improved. In simulation experiments, we show that our approach
is effective in learning walking policies, separate from each other, for five
terrain types: flat, hurdles, gaps, stairs, and steps. Moreover, we demonstrate
that in the absence of human demonstrations, a simple hand designed walking
trajectory is a sufficient prior to learn to traverse complex terrain types. In
ablation studies, we show that taking out any one of the three stages of the
curriculum degrades the learning performance.Comment: Submitted to Australasian Conference on Robotics and Automation
(ACRA) 202
Automaton-Guided Curriculum Generation for Reinforcement Learning Agents
Despite advances in Reinforcement Learning, many sequential decision making
tasks remain prohibitively expensive and impractical to learn. Recently,
approaches that automatically generate reward functions from logical task
specifications have been proposed to mitigate this issue; however, they scale
poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a
series of correct actions to reach the goal state, considering future
transitions while choosing an action). Employing a curriculum (a sequence of
increasingly complex tasks) further improves the learning speed of the agent by
sequencing intermediate tasks suited to the learning capacity of the agent.
However, generating curricula from the logical specification still remains an
unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum
Learning, a novel method for automatically generating curricula for the target
task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the
specification in the form of a deterministic finite automaton (DFA), and then
uses the DFA along with the Object-Oriented MDP (OOMDP) representation to
generate a curriculum as a DAG, where the vertices correspond to tasks, and
edges correspond to the direction of knowledge transfer. Experiments in
gridworld and physics-based simulated robotics domains show that the curricula
produced by AGCL achieve improved time-to-threshold performance on a complex
sequential decision-making problem relative to state-of-the-art curriculum
learning (e.g, teacher-student, self-play) and automaton-guided reinforcement
learning baselines (e.g, Q-Learning for Reward Machines). Further, we
demonstrate that AGCL performs well even in the presence of noise in the task's
OOMDP description, and also when distractor objects are present that are not
modeled in the logical specification of the tasks' objectives.Comment: To be presented at The International Conference on Automated Planning
and Scheduling (ICAPS) 202
Recommended from our members
Understanding Model-Based Reinforcement Learning and its Application in Safe Reinforcement Learning
Model-based reinforcement learning algorithms have been shown to achieve successful results on various continuous control benchmarks, but the understanding of model-based methods is limited. We try to interpret how model-based method works through novel experiments on state-of-the-art algorithms with an emphasis on the model learning part. We evaluate the role of the model learning in policy optimization and propose methods to learn a more accurate model. With a better understanding of model-based reinforcement learning, we then apply model-based methods to solve safe reinforcement learning (RL) problems with near-zero violation of hard constraints throughout training. Drawing an analogy with how humans and animals learn to perform safe actions, we break down the safe RL problem into three stages. First, we train agents in a constraint-free environment to learn a performant policy for reaching high rewards, and simultaneously learn a model of the dynamics. Second, we use model-based methods to plan safe actions and train a safeguarding policy from these actions through imitation. Finally, we propose a factored framework to train an overall policy that mixes the performant policy and the safeguarding policy. This three-step curriculum ensures near-zero violation of safety constraints at all times. As an advantage of model-based method, the sample complexity required at the second and third steps of the process is significantly lower than model-free methods and can enable online safe learning. We demonstrate the effectiveness of our methods in various continuous control problems and analyze the advantages over state-of-the-art approaches
- …