Search CORE

151 research outputs found

Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

Author: Farahmand Amir-massoud
Fard Mahdi Milani
Grinberg Yuri
Pineau Joelle
Precup Doina
Publication venue
Publication date: 21/09/2012
Field of study

We address the problem of automatic generation of features for value function approximation. Bellman Error Basis Functions (BEBFs) have been shown to improve the error of policy evaluation with function approximation, with a convergence rate similar to that of value iteration. We propose a simple, fast and robust algorithm based on random projections to generate BEBFs for sparse feature spaces. We provide a finite sample analysis of the proposed method, and prove that projections logarithmic in the dimension of the original space are enough to guarantee contraction in the error. Empirical results demonstrate the strength of this method

arXiv.org e-Print Archive

CiteSeerX

Cover Tree Bayesian Reinforcement Learning

Author: Blekas Konstantinos
Dimitrakakis Christos
Tziortziotis Nikolaos
Publication venue
Publication date: 08/12/2013
Field of study

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library

The Military Inventory Routing Problem: Utilizing Heuristics within a Least Squares Temporal Differences Algorithm to Solve a Multiclass Stochastic Inventory Routing Problem with Vehicle Loss

Author: Salgado Ethan L.
Publication venue: AFIT Scholar
Publication date: 01/09/2018
Field of study

Military commanders currently resupply forward operating bases (FOBs) from a central location within an area of operations mainly via convoy operations in a way that closely resembles vendor managed inventory practices. Commanders must decide when and how much inventory to distribute throughout their area of operations while minimizing soldier risk. Technology currently exists that makes utilizing unmanned cargo aerial vehicles (CUAVs) for resupply an attractive alternative due to the dangers of utilizing convoy operations. Enemy actions in wartime environments pose a significant risk to a CUAV\u27s ability to safely deliver supplies to a FOB. We develop a Markov decision process (MDP) model to examine this military inventory routing problem (MILIRP). In our first paper we examine the structure of the MILIRP by considering a small problem instance and prove value function monotonicity when a sufficient penalty is applied. Moreover, we develop a monotone least squares temporal differences (MLSTD) algorithm that exploits this structure and demonstrate its efficacy for approximately solving this problem class. We compare MLSTD to least squares temporal differences (LSTD), a similar ADP algorithm that does not exploit monotonicity. MLSTD attains a 3:05% optimality gap for a baseline scenario and outperforms LSTD by 31:86% on average in our computational experiments. Our second paper expands the problem complexity with additional FOBs. We generate two new algorithms, Index and Rollout, for the routing portion and implement an LSTD algorithm that utilized these to produce solutions 22% better than myopic generated solutions on average. Our third paper greatly increases problem complexity with the addition of supply classes. We formulate an MDP model to handle the increased complexity and implement our LSTD-Index and LSTD-Rollout algorithms to solve this larger problem instance and perform 21% better on average than a myopic policy

AFTI Scholar (Air Force Institute of Technology)

Policy evaluation with temporal differences: a survey and comparison

Author: Dann C.
Neumann G.
Peters J.
Publication venue: Massachusetts Institute of Technology Press (MIT Press) / Microtome Publishing
Publication date: 01/01/2014
Field of study

Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches. This paper aims at making these new developments accessible in a concise overview, with foci on underlying cost functions, the off-policy scenario as well as on regularization in high dimensional feature spaces. By presenting the first extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual- gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved off-policy performance

University of Lincoln Institutional Repository

TUbiblio

MPG.PuRe