126,972 research outputs found

    Transfer Value Iteration Networks

    Full text link
    Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains. However, based on our experiments, a policy learned by VINs still fail to generalize well on the domain whose action space and feature space are not identical to those in the domain where it is trained. In this paper, we propose a transfer learning approach on top of VINs, termed Transfer VINs (TVINs), such that a learned policy from a source domain can be generalized to a target domain with only limited training data, even if the source domain and the target domain have domain-specific actions and features. We empirically verify that our proposed TVINs outperform VINs when the source and the target domains have similar but not identical action and feature spaces. Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and kernel size

    Improving Deep Reinforcement Learning Using Graph Convolution and Visual Domain Transfer

    Get PDF
    Recent developments in Deep Reinforcement Learning (DRL) have shown tremendous progress in robotics control, Atari games, board games such as Go, etc. However, model free DRL still has limited use cases due to its poor sampling efficiency and generalization on a variety of tasks. In this thesis, two particular drawbacks of DRL are investigated: 1) the poor generalization abilities of model free DRL. More specifically, how to generalize an agent\u27s policy to unseen environments and generalize to task performance on different data representations (e.g. image based or graph based) 2) The reality gap issue in DRL. That is, how to effectively transfer a policy learned in a simulator to the real world. This thesis makes several novel contributions to the field of DRL which are outlined sequentially in the following. Among these contributions is the generalized value iteration network (GVIN) algorithm, which is an end-to-end neural network planning module extending the work of Value Iteration Networks (VIN). GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. Additionally, this thesis proposes three novel, differentiable kernels as graph convolution operators and shows that the embedding-based kernel achieves the best performance. Furthermore, an improvement upon traditional nn-step QQ-learning that stabilizes training for VIN and GVIN is demonstrated. Additionally, the equivalence between GVIN and graph neural networks is outlined and shown that GVIN can be further extended to address both control and inference problems. The final subject which falls under the graph domain that is studied in this thesis is graph embeddings. Specifically, this work studies a general graph embedding framework GEM-F that unifies most of the previous graph embedding algorithms. Based on the contributions made during the analysis of GEM-F, a novel algorithm called WarpMap which outperforms DeepWalk and node2vec in the unsupervised learning settings is proposed. The aforementioned reality gap in DRL prohibits a significant portion of research from reaching the real world setting. The latter part of this work studies and analyzes domain transfer techniques in an effort to bridge this gap. Typically, domain transfer in RL consists of representation transfer and policy transfer. In this work, the focus is on representation transfer for vision based applications. More specifically, aligning the feature representation from source domain to target domain in an unsupervised fashion. In this approach, a linear mapping function is considered to fuse modules that are trained in different domains. Proposed are two improved adversarial learning methods to enhance the training quality of the mapping function. Finally, the thesis demonstrates the effectiveness of domain alignment among different weather conditions in the CARLA autonomous driving simulator

    Synchronisation effects on the behavioural performance and information dynamics of a simulated minimally cognitive robotic agent

    Get PDF
    Oscillatory activity is ubiquitous in nervous systems, with solid evidence that synchronisation mechanisms underpin cognitive processes. Nevertheless, its informational content and relationship with behaviour are still to be fully understood. In addition, cognitive systems cannot be properly appreciated without taking into account brainā€“bodyā€“ environment interactions. In this paper, we developed a model based on the Kuramoto Model of coupled phase oscillators to explore the role of neural synchronisation in the performance of a simulated robotic agent in two different minimally cognitive tasks. We show that there is a statistically significant difference in performance and evolvability depending on the synchronisation regime of the network. In both tasks, a combination of information flow and dynamical analyses show that networks with a definite, but not too strong, propensity for synchronisation are more able to reconfigure, to organise themselves functionally and to adapt to different behavioural conditions. The results highlight the asymmetry of information flow and its behavioural correspondence. Importantly, it also shows that neural synchronisation dynamics, when suitably flexible and reconfigurable, can generate minimally cognitive embodied behaviour

    Grounding Language for Transfer in Deep Reinforcement Learning

    Full text link
    In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. Specifically, by learning to ground the meaning of text to the dynamics of the environment such as transitions and rewards, an autonomous agent can effectively bootstrap policy learning on a new domain given its description. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively use entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.Comment: JAIR 201

    A Gossip Algorithm based Clock Synchronization Scheme for Smart Grid Applications

    Full text link
    The uprising interest in multi-agent based networked system, and the numerous number of applications in the distributed control of the smart grid leads us to address the problem of time synchronization in the smart grid. Utility companies look for new packet based time synchronization solutions with Global Positioning System (GPS) level accuracies beyond traditional packet methods such as Network Time Proto- col (NTP). However GPS based solutions have poor reception in indoor environments and dense urban canyons as well as GPS antenna installation might be costly. Some smart grid nodes such as Phasor Measurement Units (PMUs), fault detection, Wide Area Measurement Systems (WAMS) etc., requires synchronous accuracy as low as 1 ms. On the other hand, 1 sec accuracy is acceptable in management information domain. Acknowledging this, in this study, we introduce gossip algorithm based clock synchronization method among network entities from the decision control and communication point of view. Our method synchronizes clock within dense network with a bandwidth limited environment. Our technique has been tested in different kinds of network topologies- complete, star and random geometric network and demonstrated satisfactory performance
    • ā€¦
    corecore