We present an approach for autonomous sensor control for information
gathering under partially observable, dynamic and sparsely sampled
environments. We consider the problem of controlling a sensor that makes
partial observations in some space of interest such that it maximizes
information about entities present in that space. We describe our approach for
the task of Radio-Frequency (RF) spectrum monitoring, where the goal is to
search for and track unknown, dynamic signals in the environment. To this end,
we develop and demonstrate enhancements of the Deep Anticipatory Network (DAN)
Reinforcement Learning (RL) framework that uses prediction and information-gain
rewards to learn information-maximization policies in reward-sparse
environments. We also extend this problem to situations in which taking samples
from the actual RF spectrum/field is limited and expensive, and propose a
model-based version of the original RL algorithm that fine-tunes the controller
using a model of the environment that is iteratively improved from limited
samples taken from the RF field. Our approach was thoroughly validated by
testing against baseline expert-designed controllers in simulated RF
environments of different complexity, using different rewards schemes and
evaluation metrics. The results show that our system outperforms the standard
DAN architecture and is more flexible and robust than several hand-coded
agents. We also show that our approach is adaptable to non-stationary
environments where the agent has to learn to adapt to changes from the emitting
sources.Comment: 13 page