10 research outputs found

    Object-Oriented Dynamics Learning through Multi-Level Abstraction

    Full text link
    Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability. However, existing approaches suffer from structural limitations and optimization difficulties for common environments with multiple dynamic objects. In this paper, we present a novel self-supervised learning framework, called Multi-level Abstraction Object-oriented Predictor (MAOP), which employs a three-level learning architecture that enables efficient object-based dynamics learning from raw visual observations. We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability. Our results show that MAOP significantly outperforms previous methods in terms of sample efficiency and generalization over novel environments for learning environment models. We also demonstrate that learned dynamics models enable efficient planning in unseen environments, comparable to true environment models. In addition, MAOP learns semantically and visually interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial Intelligence (AAAI), 202

    Learning competitive ensemble of information-constrained primitives

    Get PDF
    Nous voulons développer des algorithmes d'apprentissage par renforcement qui permettent à l'agent apprenant d'obtenir une décomposition structurée de son comportement. L’apprentissage par renforcement hiérarchique fournit un mécanisme permettant de le faire en modularisant explicitement la politique en deux composants: un ensemble de sous-politiques de bas niveau (ou primitives) et une politique principale de haut niveau permettant de coordonner les primitives. Alors que les primitives ne doivent se spécialiser que dans une partie de l'espace d'états, la stratégie principale doit se spécialiser dans tout l'espace d'états, car elle décide du moment d'activer les primitives. Cela introduit un ``goulot d'étranglement'' dans lequel le succès de l'agent dépend du succès de la stratégie principale, ce qui en fait un point d'échec unique. Nous proposons de supprimer cette limitation en utilisant un nouveau mécanisme selon lequel les sous-politiques peuvent décider elles-mêmes dans quelle partie de l'état elles souhaitent agir. Cette prise de décision décentralisée supprime la nécessité d’une politique principale paramétrée. Nous utilisons ce mécanisme pour former une politique composée d'un ensemble de primitives, mais ne nécessitant pas de stratégie principale pour choisir entre les primitives. Nous démontrons de manière expérimentale que cette architecture de politique améliore les politiques à la fois plates et hiérarchiques en termes de généralisation. Ce travail a été soumis à la conférence NeurIPS 2019 sous la forme d’un document intitulé Apprentissage d’un ensemble concurrentiel de primitives à contraintes d’informations. Dans le premier chapitre, j'introduis des informations de base sur l’apprentissage par renforcement, l’apprentissage par renforcement hiérarchique, les goulots d’étranglement d’information, la compositionnalité et les réseaux de modules neuronaux, et explique en quoi le travail proposé au chapitre deux est lié à ces idées. Le chapitre deux décrit l’idée de former un ensemble de primitives. Je conclus ma thèse en discutant de quelques axes de recherche futurs pour les travaux décrits au chapitre deux.We want to develop reinforcement learning algorithms that enable the learning agent to obtain a structured decomposition of its behavior. Hierarchical Reinforcement Learning provides a mechanism for doing this by explicitly modularising the policy into two components --- a set of low-level sub-policies (or primitives) and a high-level master policy to coordinate between the primitives. While the primitives have to specialize to only a part of the state space, the master policy has to specialize to the entire state space as it decides when to activate which primitives. This introduces a ``bottleneck'' where the success of the agent depends on the success of the master policy, thereby making it a single point of failure. We propose to do away with this limitation by using a new mechanism where the sub-policies can decide for themselves in which part of the state they want to act. This decentralized decision making does away with the need for a parameterized master policy. We use this mechanism to train a policy that is composed of an ensemble of primitives but one that does not require a master policy to choose between the primitives. We experimentally demonstrate that this policy architecture improves over both flat and hierarchical policies in the terms of generalization. This work is under review at the NeurIPS 2019 Conference as a paper titled Learning Competitive Ensemble of Information-Constrained Primitives. In Chapter One, I provide a background to Reinforcement Learning, Hierarchical Reinforcement Learning, Information Bottleneck, Compositionality, and Neural Module Networks and discuss how the proposed work in Chapter Two relates to these ideas. Chapter Two describes the idea of training an ensemble of primitives. I conclude the thesis by discussing some future research directions for the work described in Chapter Two

    Universal Memory Architectures for Autonomous Machines

    Get PDF
    We propose a self-organizing memory architecture (UMA) for perceptual experience provably capable of supporting autonomous learning and goal-directed problem solving in the absence of any prior information about the agent’s environment. The architecture is simple enough to ensure (1) a quadratic bound (in the number of available sensors) on space requirements, and (2) a quadratic bound on the time-complexity of the update-execute cycle. At the same time, it is sufficiently complex to provide the agent with an internal representation which is (3) minimal among all representations which account for every sensory equivalence class consistent with the agent’s belief state; (4) capable, in principle, of recovering a topological model of the problem space; and (5) learnable with arbitrary precision through a random application of the available actions. These provable properties — both the trainability and the operational efficacy of an effectively trained memory structure — exploit a duality between weak poc sets — a symbolic (discrete) representation of subset nesting relations — and non-positively curved cubical complexes, whose rich convexity theory underlies the planning cycle of the proposed architecture

    Object Focused Q-Learning for Autonomous Agents

    Get PDF
    © ACM 2013. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in AAMAS '13 Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems.We present Object Focused Q-learning (OF-Q), a novel reinforcement learning algorithm that can offer exponential speed-ups over classic Q-learning on domains composed of independent objects. An OF-Q agent treats the state space as a collection of objects organized into different object classes. Our key contribution is a control policy that uses non-optimal Q-functions to estimate the risk of ignoring parts of the state space. We compare our algorithm to traditional Q-learning and previous arbitration algorithms in two domains, including a version of Space Invaders
    corecore