1,646 research outputs found
Variational Recurrent Models for Solving Partially Observable Control Tasks
In partially observable (PO) environments, deep reinforcement learning (RL)
agents often suffer from unsatisfactory performance, since two problems need to
be tackled together: how to extract information from the raw observations to
solve the task, and how to improve the policy. In this study, we propose an RL
algorithm for solving PO tasks. Our method comprises two parts: a variational
recurrent model (VRM) for modeling the environment, and an RL controller that
has access to both the environment and the VRM. The proposed algorithm was
tested in two types of PO robotic control tasks, those in which either
coordinates or velocities were not observable and those that require long-term
memorization. Our experiments show that the proposed algorithm achieved better
data efficiency and/or learned more optimal policy than other alternative
approaches in tasks in which unobserved states cannot be inferred from raw
observations in a simple manner.Comment: Published as a conference paper at the Eighth International
Conference on Learning Representations (ICLR 2020
Variational Recurrent Models for Solving Partially Observable Control Tasks
In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from unsatisfactory performance, since two problems need to be tackled together: how to extract information from the raw observations to solve the task, and how to improve the policy. In this study, we propose an RL algorithm for solving PO tasks. Our method comprises two parts: a variational recurrent model (VRM) for modeling the environment, and an RL controller that has access to both the environment and the VRM. The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner
Deep Variational Reinforcement Learning for POMDPs
Many real-world sequential decision making problems are partially observable
by nature, and the environment model is typically unknown. Consequently, there
is great need for reinforcement learning methods that can tackle such problems
given only a stream of incomplete and noisy observations. In this paper, we
propose deep variational reinforcement learning (DVRL), which introduces an
inductive bias that allows an agent to learn a generative model of the
environment and perform inference in that model to effectively aggregate the
available information. We develop an n-step approximation to the evidence lower
bound (ELBO), allowing the model to be trained jointly with the policy. This
ensures that the latent state representation is suitable for the control task.
In experiments on Mountain Hike and flickering Atari we show that our method
outperforms previous approaches relying on recurrent neural networks to encode
the past
The Dreaming Variational Autoencoder for Reinforcement Learning Environments
Reinforcement learning has shown great potential in generalizing over raw
sensory data using only a single neural network for value optimization. There
are several challenges in the current state-of-the-art reinforcement learning
algorithms that prevent them from converging towards the global optima. It is
likely that the solution to these problems lies in short- and long-term
planning, exploration and memory management for reinforcement learning
algorithms. Games are often used to benchmark reinforcement learning algorithms
as they provide a flexible, reproducible, and easy to control environment.
Regardless, few games feature a state-space where results in exploration,
memory, and planning are easily perceived. This paper presents The Dreaming
Variational Autoencoder (DVAE), a neural network based generative modeling
architecture for exploration in environments with sparse feedback. We further
present Deep Maze, a novel and flexible maze engine that challenges DVAE in
partial and fully-observable state-spaces, long-horizon tasks, and
deterministic and stochastic problems. We show initial findings and encourage
further work in reinforcement learning driven by generative exploration.Comment: Best Student Paper Award, Proceedings of the 38th SGAI International
Conference on Artificial Intelligence, Cambridge, UK, 2018, Artificial
Intelligence XXXV, 201
DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs
A major difficulty of solving continuous POMDPs is to infer the multi-modal
distribution of the unobserved true states and to make the planning algorithm
dependent on the perceived uncertainty. We cast POMDP filtering and planning
problems as two closely related Sequential Monte Carlo (SMC) processes, one
over the real states and the other over the future optimal trajectories, and
combine the merits of these two parts in a new model named the DualSMC network.
In particular, we first introduce an adversarial particle filter that leverages
the adversarial relationship between its internal components. Based on the
filtering results, we then propose a planning algorithm that extends the
previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with
an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex
observations such as image input but also it remains highly interpretable. It
is shown to be effective in three continuous POMDP domains: the floor
positioning domain, the 3D light-dark navigation domain, and a modified Reacher
domain.Comment: IJCAI 202
ODE-based Recurrent Model-free Reinforcement Learning for POMDPs
Neural ordinary differential equations (ODEs) are widely recognized as the
standard for modeling physical mechanisms, which help to perform approximate
inference in unknown physical or biological environments. In partially
observable (PO) environments, how to infer unseen information from raw
observations puzzled the agents. By using a recurrent policy with a compact
context, context-based reinforcement learning provides a flexible way to
extract unobservable information from historical transitions. To help the agent
extract more dynamics-related information, we present a novel ODE-based
recurrent model combines with model-free reinforcement learning (RL) framework
to solve partially observable Markov decision processes (POMDPs). We
experimentally demonstrate the efficacy of our methods across various PO
continuous control and meta-RL tasks. Furthermore, our experiments illustrate
that our method is robust against irregular observations, owing to the ability
of ODEs to model irregularly-sampled time series.Comment: Accepted by NeurIPS 202
- …