Search CORE

28,753 research outputs found

Finite-Step Algorithms for Single-Controller and Perfect Information Stochastic Games

Author: A Condon
A Hordijk
A Hordijk
AJ Hoffman
AM Fink
AS Nowak
B Eaves
CB Garcia
CE Lemke
CE Lemke
D Blackwell
D Blackwell
D Blackwell
D Gillette
E Solan
F Thuijsman
H Everett
J Waal Van der
J-F Mertens
J-F Mertens
J-F Mertens
JA Filar
JA Filar
JA Filar
LCM Kallenberg
LS Shapley
M Bardi
M Breton
M Melekopoglou
M Pollatschek
M Takahashi
OJ Vrieze
OJ Vrieze
OJ Vrieze
RA Howard
RW Cottle
SR Mohan
SR Mohan
T Bewley
T Parthasarathy
T Parthasarathy
TA Shultz
TES Raghavan
TES Raghavan
TES Raghavan
TES Raghavan
TES Raghavan
TES Raghavan
TM Liggett
U Zwick
V Krishna
VA Gurwich
W Ludwig
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Abstract. After a brief survey of iterative algorithms for general stochas-tic games, we concentrate on finite-step algorithms for two special classes of stochastic games. They are Single-Controller Stochastic Games and Per-fect Information Stochastic Games. In the case of single-controller games, the transition probabilities depend on the actions of the same player in all states. In perfect information stochastic games, one of the players has exactly one action in each state. Single-controller zero-sum games are effi-ciently solved by linear programming. Non-zero-sum single-controller stochastic games are reducible to linear complementary problems (LCP). In the discounted case they can be modified to fit into the so-called LCPs of Eave’s class L. In the undiscounted case the LCP’s are reducible to Lemke’s copositive plus class. In either case Lemke’s algorithm can be used to find a Nash equilibrium. In the case of discounted zero-sum perfect informa-tion stochastic games, a policy improvement algorithm is presented. Many other classes of stochastic games with orderfield property still await efficient finite-step algorithms. 1

CiteSeerX

Crossref

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Author: Clérot Fabrice
Gajane Pratik
Urvoy Tanguy
Publication venue
Publication date: 01/01/2015
Field of study

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

Author: Alsheikh Mohammad Abu
Hoang Dinh Thai
Lin Shaowei
Niyato Dusit
Tan Hwee-Pink
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/01/2015
Field of study

Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

arXiv.org e-Print Archive

University of Canberra Research Repository