3 research outputs found

    Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

    Full text link
    We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22]

    Minimax search and reinforcement learning for adversarial tetris

    No full text
    Summarization: Game playing has always been considered an intellectual activity requiring a good level of intelligence. This paper focuses on Adversarial Tetris, a variation of the well-known Tetris game, introduced at the 3rd International Reinforcement Learning Competition in 2009. In Adversarial Tetris the mission of the player to complete as many lines as possible is actively hindered by an unknown adversary who selects the falling tetraminoes in ways that make the game harder for the player. In addition, there are boards of different sizes and learning ability is tested over a variety of boards and adversaries. This paper describes the design and implementation of an agent capable of learning to improve his strategy against any adversary and any board size. The agent employs MiniMax search enhanced with Alpha-Beta pruning for looking ahead within the game tree and a variation of the Least-Squares Temporal Difference Learning (LSTD) algorithm for learning an appropriate state evaluation function over a small set of features. The learned strategies exhibit good performance over a wide range of boards and adversaries.Παρουσιάστηκε στο: 6th Hellenic Conference on Artificial Intelligenc

    Minimax search and reinforcement learning for adversarial Tetris

    No full text
    Περίληψη: Μη διαθέσιμ
    corecore