Search CORE

3 research outputs found

Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Author: Boigelot Bernard
Ernst Damien
Fonteneau Raphael
Louveaux Quentin
Publication venue
Publication date: 01/01/2012
Field of study

We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22]

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

Minimax search and reinforcement learning for adversarial tetris

Author: Lagoudakis Michael(http://users.isc.tuc.gr/~lagoudakis)
Rovatsou M.()
Λαγουδακης Μιχαηλ(http://users.isc.tuc.gr/~lagoudakis)
Publication venue: Springer Verlag
Publication date
Field of study

Summarization: Game playing has always been considered an intellectual activity requiring a good level of intelligence. This paper focuses on Adversarial Tetris, a variation of the well-known Tetris game, introduced at the 3rd International Reinforcement Learning Competition in 2009. In Adversarial Tetris the mission of the player to complete as many lines as possible is actively hindered by an unknown adversary who selects the falling tetraminoes in ways that make the game harder for the player. In addition, there are boards of different sizes and learning ability is tested over a variety of boards and adversaries. This paper describes the design and implementation of an agent capable of learning to improve his strategy against any adversary and any board size. The agent employs MiniMax search enhanced with Alpha-Beta pruning for looking ahead within the game tree and a variation of the Least-Squares Temporal Difference Learning (LSTD) algorithm for learning an appropriate state evaluation function over a small set of features. The learned strategies exhibit good performance over a wide range of boards and adversaries.Παρουσιάστηκε στο: 6th Hellenic Conference on Artificial Intelligenc

Institutional Repository of the Technical University of Crete

Minimax search and reinforcement learning for adversarial Tetris

Author: Ροβάτσου Μαρία
Publication venue: Πολυτεχνείο Κρήτης::Τμήμα Ηλεκτρονικών Μηχανικών και Μηχανικών Υπολογιστών
Publication date
Field of study

Περίληψη: Μη διαθέσιμ

Institutional Repository of the Technical University of Crete