55,001 research outputs found
Memory Augmented Control Networks
Planning problems in partially observable environments cannot be solved
directly with convolutional networks and require some form of memory. But, even
memory networks with sophisticated addressing schemes are unable to learn
intelligent reasoning satisfactorily due to the complexity of simultaneously
learning to access memory and plan. To mitigate these challenges we introduce
the Memory Augmented Control Network (MACN). The proposed network architecture
consists of three main parts. The first part uses convolutions to extract
features and the second part uses a neural network-based planning module to
pre-plan in the environment. The third part uses a network controller that
learns to store those specific instances of past information that are necessary
for planning. The performance of the network is evaluated in discrete grid
world environments for path planning in the presence of simple and complex
obstacles. We show that our network learns to plan and can generalize to new
environments
Integrating Planning and Learning for Agents Acting in Unknown Environments
An Artificial Intelligence (AI) agent acting in an environment can perceive the environment through sensors and execute actions through actuators. Symbolic planning provides an agent with decision-making capabilities about the actions
to execute for accomplishing tasks in the environment. For applying symbolic planning, an agent needs to know its symbolic state, and an abstract model of the environment dynamics. However, in the real world, an agent has low-level
perceptions of the environment (e.g. its position given by a GPS sensor), rather than symbolic observations representing its current state. Furthermore, in many real-world scenarios, it is not feasible to provide an agent with a complete and correct model of the environment, e.g., when the environment is unknown a priori. The gap between the high-level representations, suitable for symbolic planning, and the low-level sensors and actuators, available in a real-world agent, can be bridged by integrating learning, planning, and acting. Firstly, an agent has to map its continuous perceptions into its current symbolic state, e.g. by detecting the set of objects and their properties from an RGB image provided by an onboard camera. Afterward, the agent has to build a model of the environment by interacting with the environment and observing the effects of the executed actions. Finally, the agent has to plan on the learned environment model and execute the symbolic actions through its actuators. We propose an architecture that integrates learning, planning, and acting. Our approach combines data-driven learning methods for building an environment model online with symbolic planning techniques for reasoning on the learned model. In particular, we focus on learning the environment model, from either continuous or symbolic observations, assuming the agent perceptual input is the complete and correct state of the environment, and the agent is able to execute symbolic actions in the environment. Afterward, we assume a partial model of the environment and the capability of mapping perceptions into noisy and incomplete symbolic states are given, and the agent has to exploit the environment model and its perception capabilities to perform tasks in unknown and partially observable environments. Then, we tackle the problem of online learning the mapping between continuous perceptions and symbolic states, assuming the agent is given a partial model of the environment and is able to execute symbolic actions in the real world. In our approach, we take advantage of learning methods for overcoming some of the simplifying assumptions of symbolic planning, such as the full observability of the environment, or the need of having a correct environment model. Similarly, we take advantage of symbolic planning techniques to enable an agent to autonomously gather relevant information online, which is necessary for data-driven learning methods. We experimentally show the effectiveness of our approach in simulated and complex environments, outperforming state-of-the-art methods. Finally, we empirically demonstrate the applicability of our approach in real environments, by conducting experiments on a real robot
Perseus: Randomized Point-based Value Iteration for POMDPs
Partially observable Markov decision processes (POMDPs) form an attractive
and principled framework for agent planning under uncertainty. Point-based
approximate techniques for POMDPs compute a policy based on a finite set of
points collected in advance from the agents belief space. We present a
randomized point-based value iteration algorithm called Perseus. The algorithm
performs approximate value backup stages, ensuring that in each backup stage
the value of each point in the belief set is improved; the key observation is
that a single backup may improve the value of many belief points. Contrary to
other point-based methods, Perseus backs up only a (randomly selected) subset
of points in the belief set, sufficient for improving the value of each belief
point in the set. We show how the same idea can be extended to dealing with
continuous action spaces. Experimental results show the potential of Perseus in
large scale POMDP problems
Deep Variational Reinforcement Learning for POMDPs
Many real-world sequential decision making problems are partially observable
by nature, and the environment model is typically unknown. Consequently, there
is great need for reinforcement learning methods that can tackle such problems
given only a stream of incomplete and noisy observations. In this paper, we
propose deep variational reinforcement learning (DVRL), which introduces an
inductive bias that allows an agent to learn a generative model of the
environment and perform inference in that model to effectively aggregate the
available information. We develop an n-step approximation to the evidence lower
bound (ELBO), allowing the model to be trained jointly with the policy. This
ensures that the latent state representation is suitable for the control task.
In experiments on Mountain Hike and flickering Atari we show that our method
outperforms previous approaches relying on recurrent neural networks to encode
the past
The Dreaming Variational Autoencoder for Reinforcement Learning Environments
Reinforcement learning has shown great potential in generalizing over raw
sensory data using only a single neural network for value optimization. There
are several challenges in the current state-of-the-art reinforcement learning
algorithms that prevent them from converging towards the global optima. It is
likely that the solution to these problems lies in short- and long-term
planning, exploration and memory management for reinforcement learning
algorithms. Games are often used to benchmark reinforcement learning algorithms
as they provide a flexible, reproducible, and easy to control environment.
Regardless, few games feature a state-space where results in exploration,
memory, and planning are easily perceived. This paper presents The Dreaming
Variational Autoencoder (DVAE), a neural network based generative modeling
architecture for exploration in environments with sparse feedback. We further
present Deep Maze, a novel and flexible maze engine that challenges DVAE in
partial and fully-observable state-spaces, long-horizon tasks, and
deterministic and stochastic problems. We show initial findings and encourage
further work in reinforcement learning driven by generative exploration.Comment: Best Student Paper Award, Proceedings of the 38th SGAI International
Conference on Artificial Intelligence, Cambridge, UK, 2018, Artificial
Intelligence XXXV, 201
Closed-loop Bayesian Semantic Data Fusion for Collaborative Human-Autonomy Target Search
In search applications, autonomous unmanned vehicles must be able to
efficiently reacquire and localize mobile targets that can remain out of view
for long periods of time in large spaces. As such, all available information
sources must be actively leveraged -- including imprecise but readily available
semantic observations provided by humans. To achieve this, this work develops
and validates a novel collaborative human-machine sensing solution for dynamic
target search. Our approach uses continuous partially observable Markov
decision process (CPOMDP) planning to generate vehicle trajectories that
optimally exploit imperfect detection data from onboard sensors, as well as
semantic natural language observations that can be specifically requested from
human sensors. The key innovation is a scalable hierarchical Gaussian mixture
model formulation for efficiently solving CPOMDPs with semantic observations in
continuous dynamic state spaces. The approach is demonstrated and validated
with a real human-robot team engaged in dynamic indoor target search and
capture scenarios on a custom testbed.Comment: Final version accepted and submitted to 2018 FUSION Conference
(Cambridge, UK, July 2018
- …