7,021 research outputs found
A Theory of Regularized Markov Decision Processes
Many recent successful (deep) reinforcement learning algorithms make use of
regularization, generally based on entropy or Kullback-Leibler divergence. We
propose a general theory of regularized Markov Decision Processes that
generalizes these approaches in two directions: we consider a larger class of
regularizers, and we consider the general modified policy iteration approach,
encompassing both policy iteration and value iteration. The core building
blocks of this theory are a notion of regularized Bellman operator and the
Legendre-Fenchel transform, a classical tool of convex optimization. This
approach allows for error propagation analyses of general algorithmic schemes
of which (possibly variants of) classical algorithms such as Trust Region
Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy
Programming are special cases. This also draws connections to proximal convex
optimization, especially to Mirror Descent.Comment: ICML 201
Quantum Probabilities as Behavioral Probabilities
We demonstrate that behavioral probabilities of human decision makers share
many common features with quantum probabilities. This does not imply that
humans are some quantum objects, but just shows that the mathematics of quantum
theory is applicable to the description of human decision making. The
applicability of quantum rules for describing decision making is connected with
the nontrivial process of making decisions in the case of composite prospects
under uncertainty. Such a process involves deliberations of a decision maker
when making a choice. In addition to the evaluation of the utilities of
considered prospects, real decision makers also appreciate their respective
attractiveness. Therefore, human choice is not based solely on the utility of
prospects, but includes the necessity of resolving the utility-attraction
duality. In order to justify that human consciousness really functions
similarly to the rules of quantum theory, we develop an approach defining human
behavioral probabilities as the probabilities determined by quantum rules. We
show that quantum behavioral probabilities of humans not merely explain
qualitatively how human decisions are made, but they predict quantitative
values of the behavioral probabilities. Analyzing a large set of empirical
data, we find good quantitative agreement between theoretical predictions and
observed experimental data.Comment: Latex file, 32 page
Theoretical and Practical Advances on Smoothing for Extensive-Form Games
Sparse iterative methods, in particular first-order methods, are known to be
among the most effective in solving large-scale two-player zero-sum
extensive-form games. The convergence rates of these methods depend heavily on
the properties of the distance-generating function that they are based on. We
investigate the acceleration of first-order methods for solving extensive-form
games through better design of the dilated entropy function---a class of
distance-generating functions related to the domains associated with the
extensive-form games. By introducing a new weighting scheme for the dilated
entropy function, we develop the first distance-generating function for the
strategy spaces of sequential games that has no dependence on the branching
factor of the player. This result improves the convergence rate of several
first-order methods by a factor of , where is the branching
factor of the player, and is the depth of the game tree.
Thus far, counterfactual regret minimization methods have been faster in
practice, and more popular, than first-order methods despite their
theoretically inferior convergence rates. Using our new weighting scheme and
practical tuning we show that, for the first time, the excessive gap technique
can be made faster than the fastest counterfactual regret minimization
algorithm, CFR+, in practice
- …