13,164 research outputs found

    Generalized Off-Policy Actor-Critic

    Full text link
    We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting. Compared to the commonly used excursion objective, which can be misleading about the performance of the target policy when deployed, our new objective better predicts such performance. We prove the Generalized Off-Policy Policy Gradient Theorem to compute the policy gradient of the counterfactual objective and use an emphatic approach to get an unbiased sample from this policy gradient, yielding the Generalized Off-Policy Actor-Critic (Geoff-PAC) algorithm. We demonstrate the merits of Geoff-PAC over existing algorithms in Mujoco robot simulation tasks, the first empirical success of emphatic algorithms in prevailing deep RL benchmarks.Comment: NeurIPS 201

    A deep reinforcement learning based homeostatic system for unmanned position control

    Get PDF
    Deep Reinforcement Learning (DRL) has been proven to be capable of designing an optimal control theory by minimising the error in dynamic systems. However, in many of the real-world operations, the exact behaviour of the environment is unknown. In such environments, random changes cause the system to reach different states for the same action. Hence, application of DRL for unpredictable environments is difficult as the states of the world cannot be known for non-stationary transition and reward functions. In this paper, a mechanism to encapsulate the randomness of the environment is suggested using a novel bio-inspired homeostatic approach based on a hybrid of Receptor Density Algorithm (an artificial immune system based anomaly detection application) and a Plastic Spiking Neuronal model. DRL is then introduced to run in conjunction with the above hybrid model. The system is tested on a vehicle to autonomously re-position in an unpredictable environment. Our results show that the DRL based process control raised the accuracy of the hybrid model by 32%.N/

    All-Optical Reinforcement Learning in Solitonic X-Junctions

    Get PDF
    L'etologia ha dimostrato che gruppi di animali o colonie possono eseguire calcoli complessi distribuendo semplici processi decisionali ai membri del gruppo. Ad esempio, le colonie di formiche possono ottimizzare le traiettorie verso il cibo eseguendo sia un rinforzo (o una cancellazione) delle tracce di feromone sia spostarsi da una traiettoria ad un'altra con feromone più forte. Questa procedura delle formiche possono essere implementati in un hardware fotonico per riprodurre l'elaborazione del segnale stigmergico. Presentiamo qui innovative giunzioni a X completamente integrate realizzate utilizzando guide d'onda solitoniche in grado di fornire entrambi i processi decisionali delle formiche. Le giunzioni a X proposte possono passare da comportamenti simmetrici (50/50) ad asimmetrici (80/20) utilizzando feedback ottici, cancellando i canali di uscita inutilizzati o rinforzando quelli usati.Ethology has shown that animal groups or colonies can perform complex calculation distributing simple decision-making processes to the group members. For example ant colonies can optimize the trajectories towards the food by performing both a reinforcement (or a cancellation) of the pheromone traces and a switch from one path to another with stronger pheromone. Such ant's processes can be implemented in a photonic hardware to reproduce stigmergic signal processing. We present innovative, completely integrated X-junctions realized using solitonic waveguides which can provide both ant's decision-making processes. The proposed X-junctions can switch from symmetric (50/50) to asymmetric behaviors (80/20) using optical feedbacks, vanishing unused output channels or reinforcing the used ones

    A Shared Task on Bandit Learning for Machine Translation

    Full text link
    We introduce and describe the results of a novel shared task on bandit learning for machine translation. The task was organized jointly by Amazon and Heidelberg University for the first time at the Second Conference on Machine Translation (WMT 2017). The goal of the task is to encourage research on learning machine translation from weak user feedback instead of human references or post-edits. On each of a sequence of rounds, a machine translation system is required to propose a translation for an input, and receives a real-valued estimate of the quality of the proposed translation for learning. This paper describes the shared task's learning and evaluation setup, using services hosted on Amazon Web Services (AWS), the data and evaluation metrics, and the results of various machine translation architectures and learning protocols.Comment: Conference on Machine Translation (WMT) 201

    Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    Full text link
    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.Comment: 27 pages, 17 figure
    corecore