12 research outputs found

    Emergent intentionality in perception-action subsumption hierarchies

    Get PDF
    A cognitively-autonomous artificial agent may be defined as one able to modify both its external world-model and the framework by which it represents the world, requiring two simultaneous optimization objectives. This presents deep epistemological issues centered on the question of how a framework for representation (as opposed to the entities it represents) may be objectively validated. In this summary paper, formalizing previous work in this field, it is argued that subsumptive perception-action learning has the capacity to resolve these issues by {\em a)} building the perceptual hierarchy from the bottom up so as to ground all proposed representations and {\em b)} maintaining a bijective coupling between proposed percepts and projected action possibilities to ensure empirical falsifiability of these grounded representations. In doing so, we will show that such subsumptive perception-action learners intrinsically incorporate a model for how intentionality emerges from randomized exploratory activity in the form of 'motor babbling'. Moreover, such a model of intentionality also naturally translates into a model for human-computer interfacing that makes minimal assumptions as to cognitive states

    Emergent intentionality in perception-action subsumption hierarchies

    Get PDF
    A cognitively-autonomous artificial agent may be defined as one able to modify both its external world-model and the framework by which it represents the world, requiring two simultaneous optimization objectives. This presents deep epistemological issues centered on the question of how a framework for representation (as opposed to the entities it represents) may be objectively validated. In this summary paper, formalizing previous work in this field, it is argued that subsumptive perception-action learning has the capacity to resolve these issues by {\em a)} building the perceptual hierarchy from the bottom up so as to ground all proposed representations and {\em b)} maintaining a bijective coupling between proposed percepts and projected action possibilities to ensure empirical falsifiability of these grounded representations. In doing so, we will show that such subsumptive perception-action learners intrinsically incorporate a model for how intentionality emerges from randomized exploratory activity in the form of 'motor babbling'. Moreover, such a model of intentionality also naturally translates into a model for human-computer interfacing that makes minimal assumptions as to cognitive states

    Complex Processes from Dynamical Architectures with Time-Scale Hierarchy

    Get PDF
    The idea that complex motor, perceptual, and cognitive behaviors are composed of smaller units, which are somehow brought into a meaningful relation, permeates the biological and life sciences. However, no principled framework defining the constituent elementary processes has been developed to this date. Consequently, functional configurations (or architectures) relating elementary processes and external influences are mostly piecemeal formulations suitable to particular instances only. Here, we develop a general dynamical framework for distinct functional architectures characterized by the time-scale separation of their constituents and evaluate their efficiency. Thereto, we build on the (phase) flow of a system, which prescribes the temporal evolution of its state variables. The phase flow topology allows for the unambiguous classification of qualitatively distinct processes, which we consider to represent the functional units or modes within the dynamical architecture. Using the example of a composite movement we illustrate how different architectures can be characterized by their degree of time scale separation between the internal elements of the architecture (i.e. the functional modes) and external interventions. We reveal a tradeoff of the interactions between internal and external influences, which offers a theoretical justification for the efficient composition of complex processes out of non-trivial elementary processes or functional modes

    On the utility of dreaming: a general model for how learning in artificial agents can benefit from data hallucination

    Get PDF
    We consider the benefits of dream mechanisms – that is, the ability to simulate new experiences based on past ones – in a machine learning context. Specifically, we are interested in learning for artificial agents that act in the world, and operationalize “dreaming” as a mechanism by which such an agent can use its own model of the learning environment to generate new hypotheses and training data. We first show that it is not necessarily a given that such a data-hallucination process is useful, since it can easily lead to a training set dominated by spurious imagined data until an ill-defined convergence point is reached. We then analyse a notably successful implementation of a machine learning-based dreaming mechanism by Ha and Schmidhuber (Ha, D., & Schmidhuber, J. (2018). World models. arXiv e-prints, arXiv:1803.10122). On that basis, we then develop a general framework by which an agent can generate simulated data to learn from in a manner that is beneficial to the agent. This, we argue, then forms a general method for an operationalized dream-like mechanism. We finish by demonstrating the general conditions under which such mechanisms can be useful in machine learning, wherein the implicit simulator inference and extrapolation involved in dreaming act without reinforcing inference error even when inference is incomplete

    On the utility of dreaming: a general model for how learning in artificial agents can benefit from data hallucination

    Get PDF
    We consider the benefits of dream mechanisms – that is, the ability to simulate new experiences based on past ones – in a machine learning context. Specifically, we are interested in learning for artificial agents that act in the world, and operationalize “dreaming” as a mechanism by which such an agent can use its own model of the learning environment to generate new hypotheses and training data. We first show that it is not necessarily a given that such a data-hallucination process is useful, since it can easily lead to a training set dominated by spurious imagined data until an ill-defined convergence point is reached. We then analyse a notably successful implementation of a machine learning-based dreaming mechanism by Ha and Schmidhuber (Ha, D., & Schmidhuber, J. (2018). World models. arXiv e-prints, arXiv:1803.10122). On that basis, we then develop a general framework by which an agent can generate simulated data to learn from in a manner that is beneficial to the agent. This, we argue, then forms a general method for an operationalized dream-like mechanism. We finish by demonstrating the general conditions under which such mechanisms can be useful in machine learning, wherein the implicit simulator inference and extrapolation involved in dreaming act without reinforcing inference error even when inference is incomplete

    Sample efficiency, transfer learning and interpretability for deep reinforcement learning

    Get PDF
    Deep learning has revolutionised artificial intelligence, where the application of increased compute to train neural networks on large datasets has resulted in improvements in real-world applications such as object detection, text-to-speech synthesis and machine translation. Deep reinforcement learning (DRL) has similarly shown impressive results in board and video games, but less so in real-world applications such as robotic control. To address this, I have investigated three factors prohibiting further deployment of DRL: sample efficiency, transfer learning, and interpretability. To decrease the amount of data needed to train DRL systems, I have explored various storage strategies and exploration policies for episodic control (EC) algorithms, resulting in the application of online clustering to improve the memory efficiency of EC algorithms, and the maximum entropy mellowmax policy for improving the sample efficiency and final performance of the same EC algorithms. To improve performance during transfer learning, I have shown that a multi-headed neural network architecture trained using hierarchical reinforcement learning can retain the benefits of positive transfer between tasks while mitigating the interference effects of negative transfer. I additionally investigated the use of multi-headed architectures to reduce catastrophic forgetting under the continual learning setting. While the use of multiple heads worked well within a simple environment, it was of limited use within a more complex domain, indicating that this strategy does not scale well. Finally, I applied a wide range of quantitative and qualitative techniques to better interpret trained DRL agents. In particular, I compared the effects of training DRL agents both with and without visual domain randomisation (DR), a popular technique to achieve simulation-to-real transfer, providing a series of tests that can be applied before real-world deployment. One of the major findings is that DR produces more entangled representations within trained DRL agents, indicating quantitatively that they are invariant to nuisance factors associated with the DR process. Additionally, while my environment allowed agents trained without DR to succeed without requiring complex recurrent processing, all agents trained with DR appear to integrate information over time, as evidenced through ablations on the recurrent state.Open Acces

    Learning, Prediction and Planning with Approximate Forward Models

    Get PDF
    The capacity to build internal representations of the world provides an agent with the opportunity to use them to act in its surroundings more appropriately. These internal representations may capture complex associations and keep track of the state of the agent and the environment. One of the most striking aspects of this phenomenon is that the agent may manipulate these internal representations to consider the distant future, and to formulate plans likely to lead to beneficial outcomes. Our treatment in this thesis considers this particular class of agents referred to as model-based. The behaviour of these agents is not only contingent upon the current sensory stream and their memory, but also based on hypothetical future sensory streams that are produced from potential sequences of actions. Throughout this thesis, there are two main themes that we explore and that are fundamental for advancing our understanding of model-based agents. The first is the agent's uncertainty about its environment and how it influences its decision-making. There are multiple aspects one could investigate about this relation. We analyse two specific scenarios. The first illustrates how it is possible to harness the agent's uncertainty to devise error-correction schemes. In the second, a probability distribution that defines the agent's current model is used to derive intrinsic utility signals to guide behaviour. The other main theme that permeates this thesis is the question of what are the aspects of the external world that should be stored and represented by an internal model? This question has important consequences for the design of learning objectives. As we will see in this thesis, we start with perhaps the most conceptually intuitive way to frame a learning objective for acquiring a world model. Namely, the assumption that the agent must be able to predict as accurately as possible its future observations. From this starting point, we progress towards learning objectives that introduce additional prediction targets or constraints to aim for a compressed and more essential representation of an observation. This theme concludes by trying to gain some perspective on whether it is possible, and even desirable, to attempt to have an internal model that tries to map the external observations, as we start to consider information-theoretic notions of relevance. Our results show that these design choices can have a profound effect on performance, even when the planning machinery is identical, and demonstrate the importance of building world models aligned with the agent's behavioural objectives

    Structures for Sophisticated Behaviour: Feudal Hierarchies and World Models

    Get PDF
    This thesis explores structured, reward-based behaviour in artificial agents and in animals. In Part I we investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from the hierarchical organisation of human societies, we propose the framework of Feudal Multi-agent Hierarchies (FMH), in which coordination of many agents is facilitated by a manager agent. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We show that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use shared rewards. We next investigate training FMH in simulation to solve a complex information gathering task. Our approach introduces a ‘Centralised Policy Actor-Critic’ (CPAC) and an alteration to the conventional multi-agent policy gradient, which allows one multi-agent system to advise the training of another. We further exploit this idea for communicating agents with shared rewards and demonstrate its efficacy. In Part II we examine how animals discover and exploit underlying statistical structure in their environments, even when such structure is difficult to learn and use. By analysing behavioural data from an extended experiment with rats, we show that such hidden structure can indeed be learned, but also that subjects suffer from imperfections in their ability to infer their current state. We account for their behaviour using a Hidden Markov Model, in which recent observations are integrated imperfectly with evidence from the past. We find that over the course of training, subjects learn to track their progress through the task more accurately, a change that our model largely attributes to the more reliable integration of past evidenc

    Applications of time-series generative models and inference techniques

    Get PDF
    In this dissertation, we apply deep generative modelling, amortised inference and reinforcement learning methods to real-world, practical phenomenon, and we ask if these techniques can be used to predict complex system dynamics, model biologically plausible behaviour, and guide decision making. In the past, probabilistic modelling and Bayesian inference techniques have been successfully applied in a wide array of fields, achieving success in financial market prediction, robotics, and the natural sciences. However, the use of generative models in these contexts has usually required a rigid set of linearity constraints or assumptions about the distributions used for modelling. Furthermore, inference in non-linear models can be very difficult to scale to high-dimensional models. In recent years, deep learning has been a key innovation in enabling non-linear function approximation. When applied to probabilistic modelling, deep non-linear models have significantly improved the generative capabilities of computer vision models. While an important step towards general artificial intelligence, there remains a gap between the successes of these early single-time-step deep generative models and the temporal models that will be required to deploy machine learning in the real-world. We posit that deep non-linear time-series models and sequential inference are useful in a number of these complex domains. In order to test this hypothesis, we made methodological developments related to model learning and approximate inference. We then present experimental results, which address several questions about the application of deep generative models. First, can we train a deep temporal model learning complex dynamics to perform sufficiently accurate inference and predictions at run-time. Here, ``sufficient accuracy'' means that the predictions and inferences made using our model lead to stronger performance than that given by a heuristic approach on some downstream task performed in real-time. We specifically model large compute cluster hardware performance using a deep generative model in order to use the model to tackle the downstream task of improving the overall throughput of the cluster. Generally, this question is useful to answer for a number of wider applications similar to ours which may use such modelling techniques to intervene in real-time. For example, we may be interested in applying generative modelling and inference to come up with better trading algorithms with the goal of increasing returns. We may also wish to use a deep generative epidemiology model to determine government policies that help prevent the spread of disease. Simply put, we want to ask the question, "are deep generative models powerful enough to be useful?" Next, are deep state-space models important for the generative quality of animal-like behaviour? Given a perceptual dataset of animal behaviour, such as camera views of fruit-flies interactions or collections of human handwriting samples, can a deep generative model capture the latent variability underlying such behaviour. As a step towards artificial intelligence that mirrors human and other biological organisms, we must assess whether deep generative modelling is a viable approach to capture what may be one of the most stochastic and challenging phenomenon to model. Finally, is inference a useful perspective in decision making and reinforcement learning? If so, can we improve the uncertainty estimation of different quantities used in classic reinforcement learning to further take advantage of an inference perspective? Answering these questions may help us determine if a ``Reinforcement Learning as Inference'' framework coupled with a distributional estimate of the sum of future rewards can lead to better decision making under the control setting. Although our findings are positive in terms of these questions, they come with caveats for each. First, deep generative models must be accurate to be useful for downstream tasks. Second, modelling biologically plausible behaviour is difficult without additional partial supervision in the latent space. Third, while we have made orthogonal progress in using the inference perspective for policy learning and leveraging a distributional estimate in reinforcement learning, it remains unclear how to best combine these two approaches. This thesis presents the progress made in tackling these challenges
    corecore