Search CORE

7,021 research outputs found

A Theory of Regularized Markov Decision Processes

Author: Geist Matthieu
Pietquin Olivier
Scherrer Bruno
Publication venue
Publication date: 04/06/2019
Field of study

Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.Comment: ICML 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Quantum Probabilities as Behavioral Probabilities

Author: Sornette D.
Yukalov V. I.
Publication venue: 'MDPI AG'
Publication date: 01/03/2017
Field of study

We demonstrate that behavioral probabilities of human decision makers share many common features with quantum probabilities. This does not imply that humans are some quantum objects, but just shows that the mathematics of quantum theory is applicable to the description of human decision making. The applicability of quantum rules for describing decision making is connected with the nontrivial process of making decisions in the case of composite prospects under uncertainty. Such a process involves deliberations of a decision maker when making a choice. In addition to the evaluation of the utilities of considered prospects, real decision makers also appreciate their respective attractiveness. Therefore, human choice is not based solely on the utility of prospects, but includes the necessity of resolving the utility-attraction duality. In order to justify that human consciousness really functions similarly to the rules of quantum theory, we develop an approach defining human behavioral probabilities as the probabilities determined by quantum rules. We show that quantum behavioral probabilities of humans not merely explain qualitatively how human decisions are made, but they predict quantitative values of the behavioral probabilities. Analyzing a large set of empirical data, we find good quantitative agreement between theoretical predictions and observed experimental data.Comment: Latex file, 32 page

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

Theoretical and Practical Advances on Smoothing for Extensive-Form Games

Author: Kilinc-Karzan Fatma
Kroer Christian
Sandholm Tuomas
Waugh Kevin
Publication venue
Publication date: 08/05/2017
Field of study

Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate the acceleration of first-order methods for solving extensive-form games through better design of the dilated entropy function---a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has no dependence on the branching factor of the player. This result improves the convergence rate of several first-order methods by a factor of

\Omega(b^dd)

, where

b

is the branching factor of the player, and

d

is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than first-order methods despite their theoretically inferior convergence rates. Using our new weighting scheme and practical tuning we show that, for the first time, the excessive gap technique can be made faster than the fastest counterfactual regret minimization algorithm, CFR+, in practice

arXiv.org e-Print Archive