Search CORE

7,314 research outputs found

Q-learning and policy iteration algorithms for stochastic shortest path problems

Author: A. F. Veinott Jr.
C. Derman
C. Thiery
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. S. Choi
Dimitri P. Bertsekas
E. A. Feinberg
G. J. Gordon
G. M. Baudet
H. Yu
Huizhen Yu
J. H. Eaton
J. N. Tsitsiklis
J. N. Tsitsiklis
J. N. Tsitsiklis
M. L. Puterman
P. G. Canbolat
P. Whittle
R. S. Sutton
T. S. Jaakkola
T. S. Jaakkola
U. G. Rothblum
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

The Stochastic Shortest Path Problem : A polyhedral combinatorics perspective

Author: Guillot Matthieu
Stauffer Gautier
Publication venue
Publication date: 10/02/2017
Field of study

In this paper, we give a new framework for the stochastic shortest path problem in finite state and action spaces. Our framework generalizes both the frameworks proposed by Bertsekas and Tsitsikli and by Bertsekas and Yu. We prove that the problem is well-defined and (weakly) polynomial when (i) there is a way to reach the target state from any initial state and (ii) there is no transition cycle of negative costs (a generalization of negative cost cycles). These assumptions generalize the standard assumptions for the deterministic shortest path problem and our framework encapsulates the latter problem (in contrast with prior works). In this new setting, we can show that (a) one can restrict to deterministic and stationary policies, (b) the problem is still (weakly) polynomial through linear programming, (c) Value Iteration and Policy Iteration converge, and (d) we can extend Dijkstra's algorithm

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

Author: A. Prashanth L.
Fu Michael
Publication venue
Publication date: 22/10/2018
Field of study

The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory, and present a template for a risk-sensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining some potential future research directions

arXiv.org e-Print Archive

PageRank Optimization by Edge Selection

Author: Avrachenkov
Balázs Csanád Csáji
Berkhin
Bertsekas
Bertsekas
Bertsekas
Coppersmith
Csáji
De Kerchove
Garey
Gonzaga
Ishii
Langville
Levin
Papadimitriou
Puterman
Raphaël M. Jungers
Sutton
Tseng
Vincent D. Blondel
Publication venue: 'Elsevier BV'
Publication date: 18/01/2012
Field of study

The importance of a node in a directed graph can be measured by its PageRank. The PageRank of a node is used in a number of application contexts - including ranking websites - and can be interpreted as the average portion of time spent at the node by an infinite random walk. We consider the problem of maximizing the PageRank of a node by selecting some of the edges from a set of edges that are under our control. By applying results from Markov decision theory, we show that an optimal solution to this problem can be found in polynomial time. Our core solution results in a linear programming formulation, but we also provide an alternative greedy algorithm, a variant of policy iteration, which runs in polynomial time, as well. Finally, we show that, under the slight modification for which we are given mutually exclusive pairs of edges, the problem of PageRank optimization becomes NP-hard.Comment: 30 pages, 3 figure

arXiv.org e-Print Archive

Crossref

SZTAKI Publication Repository

Repository of the Academy's Library

DIAL UCLouvain