3 research outputs found

    Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks

    Full text link
    In this paper, we consider an intrusion detection application for Wireless Sensor Networks (WSNs). We study the problem of scheduling the sleep times of the individual sensors to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous state-action spaces, in a manner similar to (Fuemmeler and Veeravalli [2008]). However, unlike their formulation, we consider infinite horizon discounted and average cost objectives as performance criteria. For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation to handle the curse of dimensionality associated with the underlying POMDP. Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation (SPSA) estimate on the faster timescale, while the Q-value parameter (arising from a linear function approximation for the Q-values) is updated in an on-policy temporal difference (TD) algorithm-like fashion on the slower timescale. The feature selection scheme employed in each of our algorithms manages the energy and tracking components in a manner that assists the search for the optimal sleep-scheduling policy. For the sake of comparison, in both discounted and average settings, we also develop a function approximation analogue of the Q-learning algorithm. This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees. Finally, we also adapt our algorithms to include a stochastic iterative estimation scheme for the intruder's mobility model. Our simulation results on a 2-dimensional network setting suggest that our algorithms result in better tracking accuracy at the cost of only a few additional sensors, in comparison to a recent prior work

    Utilização de aprendizado por reforço em decodificadores de canal por algoritmo de Viterbi de Decisão Soft

    Get PDF
    Monografia (graduação)—Universidade de Brasília, Faculdade de Tecnologia, 2014.Métodos de aprendizado de máquina são amplamente utilizados nas áreas de robótica, controle e automação para resolver problemas dinâmicos de tomada de decisão e otimização. Na área de comunicações, contudo, sua utilização é menos expressiva. Este trabalho apresenta uma introdução às comunicações digitais e à teoria da informação, para o entendimento dos decodificadores de canal baseados no algoritmo de Viterbi, e a introdução ao aprendizado por reforço, um método de aprendizado de máquina. O objetivo do trabalho é construir um decodificador de decisão soft, controlado dinamicamente por um agente de aprendizado baseado no algoritmo Q-Learning, capaz de alterar seus níveis de quantização para obter ganhos de desempenho. Conforme mostram os resultados, com uma modelagem adequada do sistema, é possível fazer com que o agente aprenda a realizar essa tarefa, sugerindo a potencial aplicação de técnicas de aprendizado por reforço na área de comunicações digitais.Machine learning methods are widely used in the area of robotics, control and automation to solve dynamic decision making and optimization problems. In communications, however, its use is less significant. This work presents an introduction to digital communications and information theory, to understand the channel decoders based on the Viterbi algorithm, and the introduction to reinforcement learning, a machine learning method. The objective is to build a soft decision decoder, dynamically controlled by a learning agent based on Q-learning algorithm, able to change its quantization levels for performance gains. As the results show, with proper system modeling, it is possible to the agent learns how to accomplish this task, suggesting the potential application of reinforcement learning techniques in digital communications

    Adaptive Sleep-Wake Control using Reinforcement Learning in Sensor Networks

    No full text
    The aim in this paper is to allocate the `sleep time' of the individual sensors in an intrusion detection application so that the energy consumption from the sensors is reduced, while keeping the tracking error to a minimum. We propose two novel reinforcement learning (RL) based algorithms that attempt to minimize a certain long-run average cost objective. Both our algorithms incorporate feature-based representations to handle the curse of dimensionality associated with the underlying partially-observable Markov decision process (POMDP). Further, the feature selection scheme used in our algorithms intelligently manages the energy cost and tracking cost factors, which in turn assists the search for the optimal sleeping policy. We also extend these algorithms to a setting where the intruder's mobility model is not known by incorporating a stochastic iterative scheme for estimating the mobility model. The simulation results on a synthetic 2-d network setting are encouraging
    corecore