2,795 research outputs found

    Representation Learning in Deep RL via Discrete Information Bottleneck

    Full text link
    Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information

    Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

    Get PDF
    Intrinsically motivated goal exploration algorithms enable machines to discover repertoires of policies that produce a diversity of effects in complex environments. These exploration algorithms have been shown to allow real world robots to acquire skills such as tool use in high-dimensional continuous state and action spaces. However, they have so far assumed that self-generated goals are sampled in a specifically engineered feature space, limiting their autonomy. In this work, we propose to use deep representation learning algorithms to learn an adequate goal space. This is a developmental 2-stage approach: first, in a perceptual learning stage, deep learning algorithms use passive raw sensor observations of world changes to learn a corresponding latent space; then goal exploration happens in a second stage by sampling goals in this latent space. We present experiments where a simulated robot arm interacts with an object, and we show that exploration algorithms using such learned representations can match the performance obtained using engineered representations

    TALK COMMONSENSE TO ME! ENRICHING LANGUAGE MODELS WITH COMMONSENSE KNOWLEDGE

    Get PDF
    Human cognition is exciting, it is a mesh up of several neural phenomena which really strive our ability to constantly reason and infer about the involving world. In cognitive computer science, Commonsense Reasoning is the terminology given to our ability to infer uncertain events and reason about Cognitive Knowledge. The introduction of Commonsense to intelligent systems has been for years desired, but the mechanism for this introduction remains a scientific jigsaw. Some, implicitly believe language understanding is enough to achieve some level of Commonsense [90]. In a less common ground, there are others who think enriching language with Knowledge Graphs might be enough for human-like reasoning [63], while there are others who believe human-like reasoning can only be truly captured with symbolic rules and logical deduction powered by Knowledge Bases, such as taxonomies and ontologies [50]. We focus on Commonsense Knowledge integration to Language Models, because we believe that this integration is a step towards a beneficial embedding of Commonsense Reasoning to interactive Intelligent Systems, such as conversational assistants. Conversational assistants, such as Alexa from Amazon, are user driven systems. Thus, giving birth to a more human-like interaction is strongly desired to really capture the user’s attention and empathy. We believe that such humanistic characteristics can be leveraged through the introduction of stronger Commonsense Knowledge and Reasoning to fruitfully engage with users. To this end, we intend to introduce a new family of models, the Relation-Aware BART (RA-BART), leveraging language generation abilities of BART [51] with explicit Commonsense Knowledge extracted from Commonsense Knowledge Graphs to further extend human capabilities on these models. We evaluate our model on three different tasks: Abstractive Question Answering, Text Generation conditioned on certain concepts and aMulti-Choice Question Answering task. We find out that, on generation tasks, RA-BART outperforms non-knowledge enriched models, however, it underperforms on the multi-choice question answering task. Our Project can be consulted in our open source, public GitHub repository (Explicit Commonsense).A cognição humana é entusiasmante, é uma malha de vários fenómenos neuronais que nos estimulam vivamente a capacidade de raciocinar e inferir constantemente sobre o mundo envolvente. Na ciência cognitiva computacional, o raciocínio de senso comum é a terminologia dada à nossa capacidade de inquirir sobre acontecimentos incertos e de raciocinar sobre o conhecimento cognitivo. A introdução do senso comum nos sistemas inteligentes é desejada há anos, mas o mecanismo para esta introdução continua a ser um quebra-cabeças científico. Alguns acreditam que apenas compreensão da linguagem é suficiente para alcançar o senso comum [90], num campo menos similar há outros que pensam que enriquecendo a linguagem com gráfos de conhecimento pode serum caminho para obter um raciocínio mais semelhante ao ser humano [63], enquanto que há outros ciêntistas que acreditam que o raciocínio humano só pode ser verdadeiramente capturado com regras simbólicas e deduções lógicas alimentadas por bases de conhecimento, como taxonomias e ontologias [50]. Concentramo-nos na integração de conhecimento de censo comum em Modelos Linguísticos, acreditando que esta integração é um passo no sentido de uma incorporação benéfica no racíocinio de senso comum em Sistemas Inteligentes Interactivos, como é o caso dos assistentes de conversação. Assistentes de conversação, como o Alexa da Amazon, são sistemas orientados aos utilizadores. Assim, dar origem a uma comunicação mais humana é fortemente desejada para captar realmente a atenção e a empatia do utilizador. Acreditamos que tais características humanísticas podem ser alavancadas por meio de uma introdução mais rica de conhecimento e raciocínio de senso comum de forma a proporcionar uma interação mais natural com o utilizador. Para tal, pretendemos introduzir uma nova família de modelos, o Relation-Aware BART (RA-BART), alavancando as capacidades de geração de linguagem do BART [51] com conhecimento de censo comum extraído a partir de grafos de conhecimento explícito de senso comum para alargar ainda mais as capacidades humanas nestes modelos. Avaliamos o nosso modelo em três tarefas distintas: Respostas a Perguntas Abstratas, Geração de Texto com base em conceitos e numa tarefa de Resposta a Perguntas de Escolha Múltipla . Descobrimos que, nas tarefas de geração, o RA-BART tem um desempenho superior aos modelos sem enriquecimento de conhecimento, contudo, tem um desempenho inferior na tarefa de resposta a perguntas de múltipla escolha. O nosso Projecto pode ser consultado no nosso repositório GitHub público, de código aberto (Explicit Commonsense)

    Neural combinatorial optimization as an enabler technology to design real-time virtual network function placement decision systems

    Get PDF
    158 p.The Fifth Generation of the mobile network (5G) represents a breakthrough technology for thetelecommunications industry. 5G provides a unified infrastructure capable of integrating over thesame physical network heterogeneous services with different requirements. This is achieved thanksto the recent advances in network virtualization, specifically in Network Function Virtualization(NFV) and Software Defining Networks (SDN) technologies. This cloud-based architecture not onlybrings new possibilities to vertical sectors but also entails new challenges that have to be solvedaccordingly. In this sense, it enables to automate operations within the infrastructure, allowing toperform network optimization at operational time (e.g., spectrum optimization, service optimization,traffic optimization). Nevertheless, designing optimization algorithms for this purpose entails somedifficulties. Solving the underlying Combinatorial Optimization (CO) problems that these problemspresent is usually intractable due to their NP-Hard nature. In addition, solutions to these problems arerequired in close to real-time due to the tight time requirements on this dynamic environment. Forthis reason, handwritten heuristic algorithms have been widely used in the literature for achievingfast approximate solutions on this context.However, particularizing heuristics to address CO problems can be a daunting task that requiresexpertise. The ability to automate this resolution processes would be of utmost importance forachieving an intelligent network orchestration. In this sense, Artificial Intelligence (AI) is envisionedas the key technology for autonomously inferring intelligent solutions to these problems. Combining AI with network virtualization can truly transform this industry. Particularly, this Thesis aims at using Neural Combinatorial Optimization (NCO) for inferring endsolutions on CO problems. NCO has proven to be able to learn near optimal solutions on classicalcombinatorial problems (e.g., the Traveler Salesman Problem (TSP), Bin Packing Problem (BPP),Vehicle Routing Problem (VRP)). Specifically, NCO relies on Reinforcement Learning (RL) toestimate a Neural Network (NN) model that describes the relation between the space of instances ofthe problem and the solutions for each of them. In other words, this model for a new instance is ableto infer a solution generalizing from the problem space where it has been trained. To this end, duringthe learning process the model takes instances from the learning space, and uses the reward obtainedfrom evaluating the solution to improve its accuracy.The work here presented, contributes to the NCO theory in two main directions. First, this workargues that the performance obtained by sequence-to-sequence models used for NCO in the literatureis improved presenting combinatorial problems as Constrained Markov Decision Processes (CMDP).Such property can be exploited for building a Markovian model that constructs solutionsincrementally based on interactions with the problem. And second, this formulation enables toaddress general constrained combinatorial problems under this framework. In this context, the modelin addition to the reward signal, relies on penalty signals generated from constraint dissatisfactionthat direct the model toward a competitive policy even in highly constrained environments. Thisstrategy allows to extend the number of problems that can be addressed using this technology.The presented approach is validated in the scope of intelligent network management, specifically inthe Virtual Network Function (VNF) placement problem. This problem consists of efficientlymapping a set of network service requests on top of the physical network infrastructure. Particularly,we seek to obtain the optimal placement for a network service chain considering the state of thevirtual environment, so that a specific resource objective is accomplished, in this case theminimization of the overall power consumption. Conducted experiments prove the capability of theproposal for learning competitive solutions when compared to classical heuristic, metaheuristic, andConstraint Programming (CP) solvers

    Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

    Full text link
    Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autotelicautotelic agentsagents: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: developmentaldevelopmental reinforcementreinforcement learninglearning. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem -- the intrinsicallyintrinsically motivatedmotivated acquisitionacquisition ofof openopen-endedended repertoiresrepertoires ofof skillsskills. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition

    Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data

    Full text link
    The growing interest in language-conditioned robot manipulation aims to develop robots capable of understanding and executing complex tasks, with the objective of enabling robots to interpret language commands and manipulate objects accordingly. While language-conditioned approaches demonstrate impressive capabilities for addressing tasks in familiar environments, they encounter limitations in adapting to unfamiliar environment settings. In this study, we propose a general-purpose, language-conditioned approach that combines base skill priors and imitation learning under unstructured data to enhance the algorithm's generalization in adapting to unfamiliar environments. We assess our model's performance in both simulated and real-world environments using a zero-shot setting. In the simulated environment, the proposed approach surpasses previously reported scores for CALVIN benchmark, especially in the challenging Zero-Shot Multi-Environment setting. The average completed task length, indicating the average number of tasks the agent can continuously complete, improves more than 2.5 times compared to the state-of-the-art method HULC. In addition, we conduct a zero-shot evaluation of our policy in a real-world setting, following training exclusively in simulated environments without additional specific adaptations. In this evaluation, we set up ten tasks and achieved an average 30% improvement in our approach compared to the current state-of-the-art approach, demonstrating a high generalization capability in both simulated environments and the real world. For further details, including access to our code and videos, please refer to our supplementary materials
    corecore