1,736 research outputs found
Bayesian Reinforcement Learning via Deep, Sparse Sampling
We address the problem of Bayesian reinforcement learning using efficient
model-based online planning. We propose an optimism-free Bayes-adaptive
algorithm to induce deeper and sparser exploration with a theoretical bound on
its performance relative to the Bayes optimal policy, with a lower
computational complexity. The main novelty is the use of a candidate policy
generator, to generate long-term options in the planning tree (over beliefs),
which allows us to create much sparser and deeper trees. Experimental results
on different environments show that in comparison to the state-of-the-art, our
algorithm is both computationally more efficient, and obtains significantly
higher reward in discrete environments.Comment: Published in AISTATS 202
Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber-Physical Systems
In many Cyber-Physical Systems, we encounter the problem of remote state
estimation of geographically distributed and remote physical processes. This
paper studies the scheduling of sensor transmissions to estimate the states of
multiple remote, dynamic processes. Information from the different sensors have
to be transmitted to a central gateway over a wireless network for monitoring
purposes, where typically fewer wireless channels are available than there are
processes to be monitored. For effective estimation at the gateway, the sensors
need to be scheduled appropriately, i.e., at each time instant one needs to
decide which sensors have network access and which ones do not. To address this
scheduling problem, we formulate an associated Markov decision process (MDP).
This MDP is then solved using a Deep Q-Network, a recent deep reinforcement
learning algorithm that is at once scalable and model-free. We compare our
scheduling algorithm to popular scheduling algorithms such as round-robin and
reduced-waiting-time, among others. Our algorithm is shown to significantly
outperform these algorithms for many example scenarios
Better Optimism By Bayes: Adaptive Planning with Rich Models
The computational costs of inference and planning have confined Bayesian
model-based reinforcement learning to one of two dismal fates: powerful
Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian
non-parametric models but using simple, myopic planning strategies such as
Thompson sampling. We ask whether it is feasible and truly beneficial to
combine rich probabilistic models with a closer approximation to fully Bayesian
planning. First, we use a collection of counterexamples to show formal problems
with the over-optimism inherent in Thompson sampling. Then we leverage
state-of-the-art techniques in efficient Bayes-adaptive planning and
non-parametric Bayesian methods to perform qualitatively better than both
existing conventional algorithms and Thompson sampling on two contextual
bandit-like problems.Comment: 11 pages, 11 figure
- …