1,059 research outputs found

    A deep reinforcement learning based homeostatic system for unmanned position control

    Get PDF
    Deep Reinforcement Learning (DRL) has been proven to be capable of designing an optimal control theory by minimising the error in dynamic systems. However, in many of the real-world operations, the exact behaviour of the environment is unknown. In such environments, random changes cause the system to reach different states for the same action. Hence, application of DRL for unpredictable environments is difficult as the states of the world cannot be known for non-stationary transition and reward functions. In this paper, a mechanism to encapsulate the randomness of the environment is suggested using a novel bio-inspired homeostatic approach based on a hybrid of Receptor Density Algorithm (an artificial immune system based anomaly detection application) and a Plastic Spiking Neuronal model. DRL is then introduced to run in conjunction with the above hybrid model. The system is tested on a vehicle to autonomously re-position in an unpredictable environment. Our results show that the DRL based process control raised the accuracy of the hybrid model by 32%.N/

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    Experimental Analysis of Reinforcement Learning Techniques for Spectrum Sharing Radar

    Full text link
    In this work, we first describe a framework for the application of Reinforcement Learning (RL) control to a radar system that operates in a congested spectral setting. We then compare the utility of several RL algorithms through a discussion of experiments performed on Commercial off-the-shelf (COTS) hardware. Each RL technique is evaluated in terms of convergence, radar detection performance achieved in a congested spectral environment, and the ability to share 100MHz spectrum with an uncooperative communications system. We examine policy iteration, which solves an environment posed as a Markov Decision Process (MDP) by directly solving for a stochastic mapping between environmental states and radar waveforms, as well as Deep RL techniques, which utilize a form of Q-Learning to approximate a parameterized function that is used by the radar to select optimal actions. We show that RL techniques are beneficial over a Sense-and-Avoid (SAA) scheme and discuss the conditions under which each approach is most effective.Comment: Accepted for publication at IEEE Intl. Radar Conference, Washington DC, Apr. 2020. This is the author's version of the wor

    A Survey on Quantum Reinforcement Learning

    Full text link
    Quantum reinforcement learning is an emerging field at the intersection of quantum computing and machine learning. While we intend to provide a broad overview of the literature on quantum reinforcement learning (our interpretation of this term will be clarified below), we put particular emphasis on recent developments. With a focus on already available noisy intermediate-scale quantum devices, these include variational quantum circuits acting as function approximators in an otherwise classical reinforcement learning setting. In addition, we survey quantum reinforcement learning algorithms based on future fault-tolerant hardware, some of which come with a provable quantum advantage. We provide both a birds-eye-view of the field, as well as summaries and reviews for selected parts of the literature.Comment: 62 pages, 16 figure

    Autonomous Reinforcement of Behavioral Sequences in Neural Dynamics

    Full text link
    We introduce a dynamic neural algorithm called Dynamic Neural (DN) SARSA(\lambda) for learning a behavioral sequence from delayed reward. DN-SARSA(\lambda) combines Dynamic Field Theory models of behavioral sequence representation, classical reinforcement learning, and a computational neuroscience model of working memory, called Item and Order working memory, which serves as an eligibility trace. DN-SARSA(\lambda) is implemented on both a simulated and real robot that must learn a specific rewarding sequence of elementary behaviors from exploration. Results show DN-SARSA(\lambda) performs on the level of the discrete SARSA(\lambda), validating the feasibility of general reinforcement learning without compromising neural dynamics.Comment: Sohrob Kazerounian, Matthew Luciw are Joint first author

    Structures for Sophisticated Behaviour: Feudal Hierarchies and World Models

    Get PDF
    This thesis explores structured, reward-based behaviour in artificial agents and in animals. In Part I we investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from the hierarchical organisation of human societies, we propose the framework of Feudal Multi-agent Hierarchies (FMH), in which coordination of many agents is facilitated by a manager agent. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We show that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use shared rewards. We next investigate training FMH in simulation to solve a complex information gathering task. Our approach introduces a ‘Centralised Policy Actor-Critic’ (CPAC) and an alteration to the conventional multi-agent policy gradient, which allows one multi-agent system to advise the training of another. We further exploit this idea for communicating agents with shared rewards and demonstrate its efficacy. In Part II we examine how animals discover and exploit underlying statistical structure in their environments, even when such structure is difficult to learn and use. By analysing behavioural data from an extended experiment with rats, we show that such hidden structure can indeed be learned, but also that subjects suffer from imperfections in their ability to infer their current state. We account for their behaviour using a Hidden Markov Model, in which recent observations are integrated imperfectly with evidence from the past. We find that over the course of training, subjects learn to track their progress through the task more accurately, a change that our model largely attributes to the more reliable integration of past evidenc
    • …
    corecore