2 research outputs found
Generalised Entropy MDPs and Minimax Regret
Bayesian methods suffer from the problem of how to specify prior beliefs. One
interesting idea is to consider worst-case priors. This requires solving a
stochastic zero-sum game. In this paper, we extend well-known results from
bandit theory in order to discover minimax-Bayes policies and discuss when they
are practical.Comment: 7 pages, NIPS workshop "From bad models to good policies