236,540 research outputs found
Continuous Episodic Control
Non-parametric episodic memory can be used to quickly latch onto high-reward
experience in reinforcement learning tasks. In contrast to parametric deep
reinforcement learning approaches, these methods only need to discover the
solution once, and may then repeatedly solve the task. However, episodic
control solutions are stored in discrete tables, and this approach has so far
only been applied to discrete action space problems. Therefore, this paper
introduces Continuous Episodic Control (CEC), a novel non-parametric episodic
memory algorithm for sequential decision making in problems with a continuous
action space. Results on several sparse-reward continuous control environments
show that our proposed method learns faster than state-of-the-art model-free RL
and memory-augmented RL algorithms, while maintaining good long-run performance
as well. In short, CEC can be a fast approach for learning in continuous
control tasks, and a useful addition to parametric RL methods in a hybrid
approach as well
CompILE: Compositional Imitation Learning and Execution
We introduce Compositional Imitation Learning and Execution (CompILE): a
framework for learning reusable, variable-length segments of
hierarchically-structured behavior from demonstration data. CompILE uses a
novel unsupervised, fully-differentiable sequence segmentation module to learn
latent encodings of sequential data that can be re-composed and executed to
perform new tasks. Once trained, our model generalizes to sequences of longer
length and from environment instances not seen during training. We evaluate
CompILE in a challenging 2D multi-task environment and a continuous control
task, and show that it can find correct task boundaries and event encodings in
an unsupervised manner. Latent codes and associated behavior policies
discovered by CompILE can be used by a hierarchical agent, where the high-level
policy selects actions in the latent code space, and the low-level,
task-specific policies are simply the learned decoders. We found that our
CompILE-based agent could learn given only sparse rewards, where agents without
task-specific policies struggle.Comment: ICML (2019
Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution
Optimal execution is a sequential decision-making problem for cost-saving in
algorithmic trading. Studies have found that reinforcement learning (RL) can
help decide the order-splitting sizes. However, a problem remains unsolved: how
to place limit orders at appropriate limit prices? The key challenge lies in
the "continuous-discrete duality" of the action space. On the one hand, the
continuous action space using percentage changes in prices is preferred for
generalization. On the other hand, the trader eventually needs to choose limit
prices discretely due to the existence of the tick size, which requires
specialization for every single stock with different characteristics (e.g., the
liquidity and the price range). So we need continuous control for
generalization and discrete control for specialization. To this end, we propose
a hybrid RL method to combine the advantages of both of them. We first use a
continuous control agent to scope an action subset, then deploy a fine-grained
agent to choose a specific limit price. Extensive experiments show that our
method has higher sample efficiency and better training stability than existing
RL algorithms and significantly outperforms previous learning-based methods for
order execution
Soft Decomposed Policy-Critic: Bridging the Gap for Effective Continuous Control with Discrete RL
Discrete reinforcement learning (RL) algorithms have demonstrated exceptional
performance in solving sequential decision tasks with discrete action spaces,
such as Atari games. However, their effectiveness is hindered when applied to
continuous control problems due to the challenge of dimensional explosion. In
this paper, we present the Soft Decomposed Policy-Critic (SDPC) architecture,
which combines soft RL and actor-critic techniques with discrete RL methods to
overcome this limitation. SDPC discretizes each action dimension independently
and employs a shared critic network to maximize the soft -function. This
novel approach enables SDPC to support two types of policies: decomposed actors
that lead to the Soft Decomposed Actor-Critic (SDAC) algorithm, and decomposed
-networks that generate Boltzmann soft exploration policies, resulting in
the Soft Decomposed-Critic Q (SDCQ) algorithm. Through extensive experiments,
we demonstrate that our proposed approach outperforms state-of-the-art
continuous RL algorithms in a variety of continuous control tasks, including
Mujoco's Humanoid and Box2d's BipedalWalker. These empirical results validate
the effectiveness of the SDPC architecture in addressing the challenges
associated with continuous control
- …