6 research outputs found

    Modulated Policy Hierarchies

    Full text link
    Solving tasks with sparse rewards is a main challenge in reinforcement learning. While hierarchical controllers are an intuitive approach to this problem, current methods often require manual reward shaping, alternating training phases, or manually defined sub tasks. We introduce modulated policy hierarchies (MPH), that can learn end-to-end to solve tasks from sparse rewards. To achieve this, we study different modulation signals and exploration for hierarchical controllers. Specifically, we find that communicating via bit-vectors is more efficient than selecting one out of multiple skills, as it enables mixing between them. To facilitate exploration, MPH uses its different time scales for temporally extended intrinsic motivation at each level of the hierarchy. We evaluate MPH on the robotics tasks of pushing and sparse block stacking, where it outperforms recent baselines.Comment: 8 pages, 5 figure

    Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning

    Full text link
    Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements. Many practical tasks require manipulation of multiple objects, and the complexity of such tasks increases with the number of objects. Learning from a curriculum of increasingly complex tasks appears to be a natural solution, but unfortunately, does not work for many scenarios. We hypothesize that the inability of the state-of-the-art algorithms to effectively utilize a task curriculum stems from the absence of inductive biases for transferring knowledge from simpler to complex tasks. We show that graph-based relational architectures overcome this limitation and enable learning of complex tasks when provided with a simple curriculum of tasks with increasing numbers of objects. We demonstrate the utility of our framework on a simulated block stacking task. Starting from scratch, our agent learns to stack six blocks into a tower. Despite using step-wise sparse rewards, our method is orders of magnitude more data-efficient and outperforms the existing state-of-the-art method that utilizes human demonstrations. Furthermore, the learned policy exhibits zero-shot generalization, successfully stacking blocks into taller towers and previously unseen configurations such as pyramids, without any further training.Comment: 10 pages, 4 figures and 1 table in main article, 3 figures and 3 tables in appendix. Supplementary website and videos at https://richardrl.github.io/relational-rl

    Syntactic inductive biases for deep learning methods

    Full text link
    Le débat entre connexionnisme et symbolisme est l'une des forces majeures qui animent le développement de l'Intelligence Artificielle. L'apprentissage profond et la linguistique théorique sont les domaines d'études les plus représentatifs pour les deux écoles respectivement. Alors que la méthode d'apprentissage profond a fait des percées impressionnantes et est devenue la principale raison de la récente prospérité de l'IA pour l'industrie et les universités, la linguistique et le symbolisme occupent quelque domaines importantes, notamment l'interprétabilité et la fiabilité. Dans cette thèse, nous essayons de construire une connexion entre les deux écoles en introduisant des biais inductifs linguistiques pour les modèles d'apprentissage profond. Nous proposons deux familles de biais inductifs, une pour la structure de circonscription et une autre pour la structure de dépendance. Le biais inductif de circonscription encourage les modèles d'apprentissage profond à utiliser différentes unités (ou neurones) pour traiter séparément les informations à long terme et à court terme. Cette séparation fournit un moyen pour les modèles d'apprentissage profond de construire les représentations hiérarchiques latentes à partir d'entrées séquentielles, dont une représentation de niveau supérieur est composée et peut être décomposée en une série de représentations de niveau inférieur. Par exemple, sans connaître la structure de vérité fondamentale, notre modèle proposé apprend à traiter l'expression logique en composant des représentations de variables et d'opérateurs en représentations d'expressions selon sa structure syntaxique. D'autre part, le biais inductif de dépendance encourage les modèles à trouver les relations latentes entre les mots dans la séquence d'entrée. Pour le langage naturel, les relations latentes sont généralement modélisées sous la forme d'un graphe de dépendance orienté, où un mot a exactement un nœud parent et zéro ou plusieurs nœuds enfants. Après avoir appliqué cette contrainte à un modèle de type transformateur, nous constatons que le modèle est capable d'induire des graphes orientés proches des annotations d'experts humains, et qu'il surpasse également le modèle de transformateur standard sur différentes tâches. Nous pensons que ces résultats expérimentaux démontrent une alternative intéressante pour le développement futur de modèles d'apprentissage profond.The debate between connectionism and symbolism is one of the major forces that drive the development of Artificial Intelligence. Deep Learning and theoretical linguistics are the most representative fields of study for the two schools respectively. While the deep learning method has made impressive breakthroughs and became the major reason behind the recent AI prosperity for industry and academia, linguistics and symbolism still holding some important grounds including reasoning, interpretability and reliability. In this thesis, we try to build a connection between the two schools by introducing syntactic inductive biases for deep learning models. We propose two families of inductive biases, one for constituency structure and another one for dependency structure. The constituency inductive bias encourages deep learning models to use different units (or neurons) to separately process long-term and short-term information. This separation provides a way for deep learning models to build the latent hierarchical representations from sequential inputs, that a higher-level representation is composed of and can be decomposed into a series of lower-level representations. For example, without knowing the ground-truth structure, our proposed model learns to process logical expression through composing representations of variables and operators into representations of expressions according to its syntactic structure. On the other hand, the dependency inductive bias encourages models to find the latent relations between entities in the input sequence. For natural language, the latent relations are usually modeled as a directed dependency graph, where a word has exactly one parent node and zero or several children nodes. After applying this constraint to a transformer-like model, we find the model is capable of inducing directed graphs that are close to human expert annotations, and it also outperforms the standard transformer model on different tasks. We believe that these experimental results demonstrate an interesting alternative for the future development of deep learning models

    my Human Brain Project (mHBP)

    Get PDF
    How can we make an agent that thinks like us humans? An agent that can have proprioception, intrinsic motivation, identify deception, use small amounts of energy, transfer knowledge between tasks and evolve? This is the problem that this thesis is focusing on. Being able to create a piece of software that can perform tasks like a human being, is a goal that, if achieved, will allow us to extend our own capabilities to a very high level, and have more tasks performed in a predictable fashion. This is one of the motivations for this thesis. To address this problem, we have proposed a modular architecture for Reinforcement Learning computation and developed an implementation to have this architecture exercised. This software, that we call mHBP, is created in Python using Webots as an environment for the agent, and Neo4J, a graph database, as memory. mHBP takes the sensory data or other inputs, and produces, based on the body parts / tools that the agent has available, an output consisting of actions to perform. This thesis involves experimental design with several iterations, exploring a theoretical approach to RL based on graph databases. We conclude, with our work in this thesis, that it is possible to represent episodic data in a graph, and is also possible to interconnect Webots, Python and Neo4J to support a stable architecture for Reinforcement Learning. In this work we also find a way to search for policies using the Neo4J querying language: Cypher. Another key conclusion of this work is that state representation needs to have further research to find a state definition that enables policy search to produce more useful policies. The article “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” at Research Gate with doi 10.13140/RG.2.2.30323.76327 is an outcome of this thesis.Como podemos criar um agente que pense como nós humanos? Um agente que tenha propriocepção, motivação intrínseca, seja capaz de identificar ilusão, usar pequenas quantidades de energia, transferir conhecimento entre tarefas e evoluir? Este é o problema em que se foca esta tese. Ser capaz de criar uma peça de software que desempenhe tarefas como um ser humano é um objectivo que, se conseguido, nos permitirá estender as nossas capacidades a um nível muito alto, e conseguir realizar mais tarefas de uma forma previsível. Esta é uma das motivações desta tese. Para endereçar este problema, propomos uma arquitectura modular para computação de aprendizagem por reforço e desenvolvemos uma implementação para exercitar esta arquitetura. Este software, ao qual chamamos mHBP, foi criado em Python usando o Webots como um ambiente para o agente, e o Neo4J, uma base de dados de grafos, como memória. O mHBP recebe dados sensoriais ou outros inputs, e produz, baseado nas partes do corpo / ferramentas que o agente tem disponíveis, um output que consiste em ações a desempenhar. Uma boa parte desta tese envolve desenho experimental com diversas iterações, explorando uma abordagem teórica assente em bases de dados de grafos. Concluímos, com o trabalho nesta tese, que é possível representar episódios em um grafo, e que é, também, possível interligar o Webots, com o Python e o Neo4J para suportar uma arquitetura estável para a aprendizagem por reforço. Neste trabalho, também, encontramos uma forma de procurar políticas usando a linguagem de pesquisa do Neo4J: Cypher. Outra conclusão chave deste trabalho é que a representação de estados necessita de mais investigação para encontrar uma definição de estado que permita à pesquisa de políticas produzir políticas que sejam mais úteis. O artigo “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” no Research Gate com o doi 10.13140/RG.2.2.30323.76327 é um sub-produto desta tese

    Modulated Policy Hierarchies

    No full text
    8 pages, 5 figuresInternational audienceSolving tasks with sparse rewards is a main challenge in reinforcement learning. While hierarchical controllers are an intuitive approach to this problem, current methods often require manual reward shaping, alternating training phases, or manually defined sub tasks. We introduce modulated policy hierarchies (MPH), that can learn end-to-end to solve tasks from sparse rewards. To achieve this, we study different modulation signals and exploration for hierarchical controllers. Specifically, we find that communicating via bit-vectors is more efficient than selecting one out of multiple skills, as it enables mixing between them. To facilitate exploration, MPH uses its different time scales for temporally extended intrinsic motivation at each level of the hierarchy. We evaluate MPH on the robotics tasks of pushing and sparse block stacking, where it outperforms recent baselines
    corecore