27 research outputs found
An Adversarial Interpretation of Information-Theoretic Bounded Rationality
Recently, there has been a growing interest in modeling planning with
information constraints. Accordingly, an agent maximizes a regularized expected
utility known as the free energy, where the regularizer is given by the
information divergence from a prior to a posterior policy. While this approach
can be justified in various ways, including from statistical mechanics and
information theory, it is still unclear how it relates to decision-making
against adversarial environments. This connection has previously been suggested
in work relating the free energy to risk-sensitive control and to extensive
form games. Here, we show that a single-agent free energy optimization is
equivalent to a game between the agent and an imaginary adversary. The
adversary can, by paying an exponential penalty, generate costs that diminish
the decision maker's payoffs. It turns out that the optimal strategy of the
adversary consists in choosing costs so as to render the decision maker
indifferent among its choices, which is a definining property of a Nash
equilibrium, thus tightening the connection between free energy optimization
and game theory.Comment: 7 pages, 4 figures. Proceedings of AAAI-1
Universal Convexification via Risk-Aversion
We develop a framework for convexifying a fairly general class of
optimization problems. Under additional assumptions, we analyze the
suboptimality of the solution to the convexified problem relative to the
original nonconvex problem and prove additive approximation guarantees. We then
develop algorithms based on stochastic gradient methods to solve the resulting
optimization problems and show bounds on convergence rates. %We show a simple
application of this framework to supervised learning, where one can perform
integration explicitly and can use standard (non-stochastic) optimization
algorithms with better convergence guarantees. We then extend this framework to
apply to a general class of discrete-time dynamical systems. In this context,
our convexification approach falls under the well-studied paradigm of
risk-sensitive Markov Decision Processes. We derive the first known model-based
and model-free policy gradient optimization algorithms with guaranteed
convergence to the optimal solution. Finally, we present numerical results
validating our formulation in different applications
Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
Information-theoretic principles for learning and acting have been proposed
to solve particular classes of Markov Decision Problems. Mathematically, such
approaches are governed by a variational free energy principle and allow
solving MDP planning problems with information-processing constraints expressed
in terms of a Kullback-Leibler divergence with respect to a reference
distribution. Here we consider a generalization of such MDP planners by taking
model uncertainty into account. As model uncertainty can also be formalized as
an information-processing constraint, we can derive a unified solution from a
single generalized variational principle. We provide a generalized value
iteration scheme together with a convergence proof. As limit cases, this
generalized scheme includes standard value iteration with a known model,
Bayesian MDP planning, and robust planning. We demonstrate the benefits of this
approach in a grid world simulation.Comment: 16 pages, 3 figure