2 research outputs found

    Generalised Entropy MDPs and Minimax Regret

    Full text link
    Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.Comment: 7 pages, NIPS workshop "From bad models to good policies
    corecore