6 research outputs found
Predictive and reactive reinforcement learning from images
Deep Reinforcement Learning (DRL) is a general purpose approach for solving se quential tasks. It is based on decades of research in control theory combined with the
relatively recent field of Deep Learning (DL), which takes advantage of hierarchical
trainable models with millions, and more recently billions, of parameters. These
together enable successful control policies in high dimensional unstructured data
settings from zero domain knowledge. Currently, game-like environments, where
agents perceive worlds through images and receive scores, serve as reliable scientific
benchmarks. Thus, we refer to sequential problems from this category frequently.
While DRL is responsible for achieving many recent AI milestones, such as su perhuman competence at Go, Starcraft, Dota and Atari, the ambitions are greater.
In the future, reinforcement learners are likely to control cars, robots and stock
trading. The majority of successful learners so far have been model-free learners -
agents that discover reflex-like responses to stimuli by relying on expressive deep
neural architectures and high volumes of data (up to hundreds of years of experi ence). Recently there have been attempts to enable predictive behaviour - where
agents learn models of their environments and generate plans. While there are suc cessful examples that fit this description, the problem proves more challenging than
initially anticipated.
This thesis comprehensively presents and examines building parts of predictive
agents, expected challenges in transitioning from reactive approaches and the trade offs between the two classes of agents.
The early part of the thesis provides the theoretical background necessary for
the exploration of predictive agents. This includes reviews of deep learning and
reinforcement learning as well as probabilistic state estimation and prediction. We
then introduce a novel approach for learning the physics of a stochastic process solely
through observation of images. The method is based on a deep recurrent neural
architecture. It exemplifies how agents with no domain knowledge can build forward
models of environments which can then serve for planning and other purposes. In the same section we review approaches that followed ours and reflect on their importance
for building predictive agents.
The second part of the thesis pools insights, advantages and shortcomings of
the many variants of reactive and predictive reinforcement learners proposed to
date. We distil the fundamental differences and generalise to finally offer a novel
categorisation of agents. Through this, we identify the key trade-offs and provide
recommendations of suitability for different classes of environments. We identify a
particular category, termed implicitly predictive agents, as especially promising for
the complex benchmarks that involve large complexity and partial observability.
The last part of the thesis presents an evaluation of reinforcement learners on
a task that is normally solved through recursive search (maze navigation). Here,
we experimentally confirm our hypotheses about implicitly predictive agents and
provide novel agent designs. We find that agents relying on recurrent neural architectures may implicitly learn to use recursive computation to solve search problems.
Finally, we explore the implications of our findings for the future of predictive
agents and recommend promising research avenues
Sensor management with regional statistics for the PHD filter
This paper investigates a sensor management scheme that aims at minimising the regional variance in the number of objects present in regions of interest whilst performing multi-target filtering with the PHD filter. The experiments are conducted in a simulated environment with groups of targets moving through a scene in order to inspect the behaviour of the manager. The results demonstrate that computing the variance in the number of objects in different regions provides a viable means of increasing situational awareness where complete coverage is not possible. A discussion follows, highlighting the limitations of the PHD filter and discussing the applicability of the proposed method to alternative available approaches in multi-object filtering