3 research outputs found
Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks
In this paper, we consider an intrusion detection application for Wireless
Sensor Networks (WSNs). We study the problem of scheduling the sleep times of
the individual sensors to maximize the network lifetime while keeping the
tracking error to a minimum. We formulate this problem as a
partially-observable Markov decision process (POMDP) with continuous
state-action spaces, in a manner similar to (Fuemmeler and Veeravalli [2008]).
However, unlike their formulation, we consider infinite horizon discounted and
average cost objectives as performance criteria. For each criterion, we propose
a convergent on-policy Q-learning algorithm that operates on two timescales,
while employing function approximation to handle the curse of dimensionality
associated with the underlying POMDP. Our proposed algorithm incorporates a
policy gradient update using a one-simulation simultaneous perturbation
stochastic approximation (SPSA) estimate on the faster timescale, while the
Q-value parameter (arising from a linear function approximation for the
Q-values) is updated in an on-policy temporal difference (TD) algorithm-like
fashion on the slower timescale. The feature selection scheme employed in each
of our algorithms manages the energy and tracking components in a manner that
assists the search for the optimal sleep-scheduling policy. For the sake of
comparison, in both discounted and average settings, we also develop a function
approximation analogue of the Q-learning algorithm. This algorithm, unlike the
two-timescale variant, does not possess theoretical convergence guarantees.
Finally, we also adapt our algorithms to include a stochastic iterative
estimation scheme for the intruder's mobility model. Our simulation results on
a 2-dimensional network setting suggest that our algorithms result in better
tracking accuracy at the cost of only a few additional sensors, in comparison
to a recent prior work
Utilização de aprendizado por reforço em decodificadores de canal por algoritmo de Viterbi de Decisão Soft
Monografia (graduação)—Universidade de Brasília, Faculdade de Tecnologia, 2014.Métodos de aprendizado de máquina são amplamente utilizados nas áreas de robótica,
controle e automação para resolver problemas dinâmicos de tomada de decisão e otimização.
Na área de comunicações, contudo, sua utilização é menos expressiva.
Este trabalho apresenta uma introdução às comunicações digitais e à teoria da
informação, para o entendimento dos decodificadores de canal baseados no algoritmo de
Viterbi, e a introdução ao aprendizado por reforço, um método de aprendizado de máquina.
O objetivo do trabalho é construir um decodificador de decisão soft, controlado
dinamicamente por um agente de aprendizado baseado no algoritmo Q-Learning, capaz de
alterar seus níveis de quantização para obter ganhos de desempenho.
Conforme mostram os resultados, com uma modelagem adequada do sistema, é possível
fazer com que o agente aprenda a realizar essa tarefa, sugerindo a potencial aplicação de
técnicas de aprendizado por reforço na área de comunicações digitais.Machine learning methods are widely used in the area of robotics, control and
automation to solve dynamic decision making and optimization problems. In
communications, however, its use is less significant.
This work presents an introduction to digital communications and information theory, to
understand the channel decoders based on the Viterbi algorithm, and the introduction to
reinforcement learning, a machine learning method.
The objective is to build a soft decision decoder, dynamically controlled by a learning
agent based on Q-learning algorithm, able to change its quantization levels for performance
gains.
As the results show, with proper system modeling, it is possible to the agent learns how
to accomplish this task, suggesting the potential application of reinforcement learning
techniques in digital communications
Adaptive Sleep-Wake Control using Reinforcement Learning in Sensor Networks
The aim in this paper is to allocate the `sleep time' of the individual sensors in an intrusion detection application so that the energy consumption from the sensors is reduced, while keeping the tracking error to a minimum. We propose two novel reinforcement learning (RL) based algorithms that attempt to minimize a certain long-run average cost objective. Both our algorithms incorporate feature-based representations to handle the curse of dimensionality associated with the underlying partially-observable Markov decision process (POMDP). Further, the feature selection scheme used in our algorithms intelligently manages the energy cost and tracking cost factors, which in turn assists the search for the optimal sleeping policy. We also extend these algorithms to a setting where the intruder's mobility model is not known by incorporating a stochastic iterative scheme for estimating the mobility model. The simulation results on a synthetic 2-d network setting are encouraging