6 research outputs found

    Predictive and reactive reinforcement learning from images

    Get PDF
    Deep Reinforcement Learning (DRL) is a general purpose approach for solving se quential tasks. It is based on decades of research in control theory combined with the relatively recent field of Deep Learning (DL), which takes advantage of hierarchical trainable models with millions, and more recently billions, of parameters. These together enable successful control policies in high dimensional unstructured data settings from zero domain knowledge. Currently, game-like environments, where agents perceive worlds through images and receive scores, serve as reliable scientific benchmarks. Thus, we refer to sequential problems from this category frequently. While DRL is responsible for achieving many recent AI milestones, such as su perhuman competence at Go, Starcraft, Dota and Atari, the ambitions are greater. In the future, reinforcement learners are likely to control cars, robots and stock trading. The majority of successful learners so far have been model-free learners - agents that discover reflex-like responses to stimuli by relying on expressive deep neural architectures and high volumes of data (up to hundreds of years of experi ence). Recently there have been attempts to enable predictive behaviour - where agents learn models of their environments and generate plans. While there are suc cessful examples that fit this description, the problem proves more challenging than initially anticipated. This thesis comprehensively presents and examines building parts of predictive agents, expected challenges in transitioning from reactive approaches and the trade offs between the two classes of agents. The early part of the thesis provides the theoretical background necessary for the exploration of predictive agents. This includes reviews of deep learning and reinforcement learning as well as probabilistic state estimation and prediction. We then introduce a novel approach for learning the physics of a stochastic process solely through observation of images. The method is based on a deep recurrent neural architecture. It exemplifies how agents with no domain knowledge can build forward models of environments which can then serve for planning and other purposes. In the same section we review approaches that followed ours and reflect on their importance for building predictive agents. The second part of the thesis pools insights, advantages and shortcomings of the many variants of reactive and predictive reinforcement learners proposed to date. We distil the fundamental differences and generalise to finally offer a novel categorisation of agents. Through this, we identify the key trade-offs and provide recommendations of suitability for different classes of environments. We identify a particular category, termed implicitly predictive agents, as especially promising for the complex benchmarks that involve large complexity and partial observability. The last part of the thesis presents an evaluation of reinforcement learners on a task that is normally solved through recursive search (maze navigation). Here, we experimentally confirm our hypotheses about implicitly predictive agents and provide novel agent designs. We find that agents relying on recurrent neural architectures may implicitly learn to use recursive computation to solve search problems. Finally, we explore the implications of our findings for the future of predictive agents and recommend promising research avenues

    Sensor management with regional statistics for the PHD filter

    Full text link
    This paper investigates a sensor management scheme that aims at minimising the regional variance in the number of objects present in regions of interest whilst performing multi-target filtering with the PHD filter. The experiments are conducted in a simulated environment with groups of targets moving through a scene in order to inspect the behaviour of the manager. The results demonstrate that computing the variance in the number of objects in different regions provides a viable means of increasing situational awareness where complete coverage is not possible. A discussion follows, highlighting the limitations of the PHD filter and discussing the applicability of the proposed method to alternative available approaches in multi-object filtering
    corecore