Search CORE

16,680 research outputs found

When Waiting is not an Option : Learning Options with a Deliberation Cost

Author: Bacon Pierre-Luc
Harb Jean
Klissarov Martin
Precup Doina
Publication venue
Publication date: 13/09/2017
Field of study

Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. While the problem of "how" to learn options is increasingly well understood, the question of "what" good options should be has remained elusive. We formulate our answer to what "good" options should be in the bounded rationality framework (Simon, 1957) through the notion of deliberation cost. We then derive practical gradient-based learning algorithms to implement this objective. Our results in the Arcade Learning Environment (ALE) show increased performance and interpretability

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Hierarchical Reinforcement Learning Method for Persistent Time-Sensitive Tasks

Author: Belta Calin
Li Xiao
Publication venue
Publication date: 01/01/2016
Field of study

Reinforcement learning has been applied to many interesting problems such as the famous TD-gammon and the inverted helicopter flight. However, little effort has been put into developing methods to learn policies for complex persistent tasks and tasks that are time-sensitive. In this paper, we take a step towards solving this problem by using signal temporal logic (STL) as task specification, and taking advantage of the temporal abstraction feature that the options framework provide. We show via simulation that a relatively easy to implement algorithm that combines STL and options can learn a satisfactory policy with a small number of training case

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

A hierarchical reinforcement learning method for persistent time-sensitive tasks

Author: Belta Calin
Li Xiao
Publication venue
Publication date: 01/01/2016
Field of study

Boston University Institutional Repository (OpenBU)

Learning with Options that Terminate Off-Policy

Author: Bacon Pierre-Luc
Harutyunyan Anna
Nowe Ann
Precup Doina
Vrancx Peter
Publication venue
Publication date: 02/12/2017
Field of study

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy exactly, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(\beta), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(\beta) by casting learning with options into a common framework with well-studied multi-step off-policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications