16,680 research outputs found
When Waiting is not an Option : Learning Options with a Deliberation Cost
Recent work has shown that temporally extended actions (options) can be
learned fully end-to-end as opposed to being specified in advance. While the
problem of "how" to learn options is increasingly well understood, the question
of "what" good options should be has remained elusive. We formulate our answer
to what "good" options should be in the bounded rationality framework (Simon,
1957) through the notion of deliberation cost. We then derive practical
gradient-based learning algorithms to implement this objective. Our results in
the Arcade Learning Environment (ALE) show increased performance and
interpretability
A Hierarchical Reinforcement Learning Method for Persistent Time-Sensitive Tasks
Reinforcement learning has been applied to many interesting problems such as
the famous TD-gammon and the inverted helicopter flight. However, little effort
has been put into developing methods to learn policies for complex persistent
tasks and tasks that are time-sensitive. In this paper, we take a step towards
solving this problem by using signal temporal logic (STL) as task
specification, and taking advantage of the temporal abstraction feature that
the options framework provide. We show via simulation that a relatively easy to
implement algorithm that combines STL and options can learn a satisfactory
policy with a small number of training case
A hierarchical reinforcement learning method for persistent time-sensitive tasks
Reinforcement learning has been applied to many interesting problems such as the famous TD-gammon and the inverted helicopter flight. However, little effort has been put into developing methods to learn policies for complex persistent tasks and tasks that are time-sensitive. In this paper, we take a step towards solving this problem by using signal temporal logic (STL) as task specification, and taking advantage of the temporal abstraction feature that the options framework provide. We show via simulation that a relatively easy to implement algorithm that combines STL and options can learn a satisfactory policy with a small number of training cases
Learning with Options that Terminate Off-Policy
A temporally abstract action, or an option, is specified by a policy and a
termination condition: the policy guides option behavior, and the termination
condition roughly determines its length. Generally, learning with longer
options (like learning with multi-step returns) is known to be more efficient.
However, if the option set for the task is not ideal, and cannot express the
primitive optimal policy exactly, shorter options offer more flexibility and
can yield a better solution. Thus, the termination condition puts learning
efficiency at odds with solution quality. We propose to resolve this dilemma by
decoupling the behavior and target terminations, just like it is done with
policies in off-policy learning. To this end, we give a new algorithm,
Q(\beta), that learns the solution with respect to any termination condition,
regardless of how the options actually terminate. We derive Q(\beta) by casting
learning with options into a common framework with well-studied multi-step
off-policy learning. We validate our algorithm empirically, and show that it
holds up to its motivating claims.Comment: AAAI 201
- …