Search CORE

9,573 research outputs found

Projections for Approximate Policy Iteration Algorithms

Author: Akrour R.
Neumann Gerhard
Pajarinen J.
Peters J.
Publication venue: Proceedings of Machine Learning Research
Publication date: 01/01/2019
Field of study

Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms

University of Lincoln Institutional Repository

MPG.PuRe

Projections for Approximate Policy Iteration Algorithms

Author: Akrour Riad
Neumann Gerhard
Pajarinen Joni
Peters Jan
Publication venue: PMLR
Publication date: 01/01/2022
Field of study

TUbiblio

tuprints

Controlled Sequential Monte Carlo

Author: Bishop Adrian N.
Deligiannidis George
Doucet Arnaud
Heng Jeremy
Publication venue
Publication date: 30/10/2019
Field of study

Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques for approximating high-dimensional probability distributions and their normalizing constants. These methods have found numerous applications in statistics and related fields; e.g. for inference in non-linear non-Gaussian state space models, and in complex static models. Like many Monte Carlo sampling schemes, they rely on proposal distributions which crucially impact their performance. We introduce here a class of controlled sequential Monte Carlo algorithms, where the proposal distributions are determined by approximating the solution to an associated optimal control problem using an iterative scheme. This method builds upon a number of existing algorithms in econometrics, physics, and statistics for inference in state space models, and generalizes these methods so as to accommodate complex static models. We provide a theoretical analysis concerning the fluctuation and stability of this methodology that also provides insight into the properties of related algorithms. We demonstrate significant gains over state-of-the-art methods at a fixed computational complexity on a variety of applications

arXiv.org e-Print Archive

Oxford University Research Archive

On Resource Allocation in Fading Multiple Access Channels - An Efficient Approximate Projection Approach

Author: Eryilmaz Atilla
Medard Muriel
Ozdaglar Asuman
ParandehGheibi Ali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

We consider the problem of rate and power allocation in a multiple-access channel. Our objective is to obtain rate and power allocation policies that maximize a general concave utility function of average transmission rates on the information theoretic capacity region of the multiple-access channel. Our policies does not require queue-length information. We consider several different scenarios. First, we address the utility maximization problem in a nonfading channel to obtain the optimal operating rates, and present an iterative gradient projection algorithm that uses approximate projection. By exploiting the polymatroid structure of the capacity region, we show that the approximate projection can be implemented in time polynomial in the number of users. Second, we consider resource allocation in a fading channel. Optimal rate and power allocation policies are presented for the case that power control is possible and channel statistics are available. For the case that transmission power is fixed and channel statistics are unknown, we propose a greedy rate allocation policy and provide bounds on the performance difference of this policy and the optimal policy in terms of channel variations and structure of the utility function. We present numerical results that demonstrate superior convergence rate performance for the greedy policy compared to queue-length based policies. In order to reduce the computational complexity of the greedy policy, we present approximate rate allocation policies which track the greedy policy within a certain neighborhood that is characterized in terms of the speed of fading.Comment: 32 pages, Submitted to IEEE Trans. on Information Theor

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view

Author: Scherrer Bruno
Publication venue
Publication date: 01/01/2010
Field of study

We investigate projection methods, for evaluating a linear approximation of the value function of a policy in a Markov Decision Process context. We consider two popular approaches, the one-step Temporal Difference fix-point computation (TD(0)) and the Bellman Residual (BR) minimization. We describe examples, where each method outperforms the other. We highlight a simple relation between the objective function they minimize, and show that while BR enjoys a performance guarantee, TD(0) does not in general. We then propose a unified view in terms of oblique projections of the Bellman equation, which substantially simplifies and extends the characterization of (schoknecht,2002) and the recent analysis of (Yu & Bertsekas, 2008). Eventually, we describe some simulations that suggest that if the TD(0) solution is usually slightly better than the BR solution, its inherent numerical instability makes it very bad in some cases, and thus worse on average

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

Author: Lesner Boris
Scherrer Bruno
Publication venue
Publication date: 29/11/2012
Field of study

We consider infinite-horizon stationary

\gamma

-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. Using Value and Policy Iteration with some error

\epsilon

at each iteration, it is well-known that one can compute stationary policies that are

\frac{2\gamma}{(1-\gamma)^2}\epsilon

-optimal. After arguing that this guarantee is tight, we develop variations of Value and Policy Iteration for computing non-stationary policies that can be up to

\frac{2\gamma}{1-\gamma}\epsilon

-optimal, which constitutes a significant improvement in the usual situation when

\gamma

is close to 1. Surprisingly, this shows that the problem of "computing near-optimal non-stationary policies" is much simpler than that of "computing near-optimal stationary policies"

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

Author: Farahmand Amir-massoud
Fard Mahdi Milani
Grinberg Yuri
Pineau Joelle
Precup Doina
Publication venue
Publication date: 21/09/2012
Field of study

We address the problem of automatic generation of features for value function approximation. Bellman Error Basis Functions (BEBFs) have been shown to improve the error of policy evaluation with function approximation, with a convergence rate similar to that of value iteration. We propose a simple, fast and robust algorithm based on random projections to generate BEBFs for sparse feature spaces. We provide a finite sample analysis of the proposed method, and prove that projections logarithmic in the dimension of the original space are enough to guarantee contraction in the error. Empirical results demonstrate the strength of this method

arXiv.org e-Print Archive

CiteSeerX

Computing Probabilistic Bisimilarity Distances for Probabilistic Automata

Author: Bacci Giorgio
Bacci Giovanni
Larsen Kim G.
Mardare Radu
Tang Qiyi
van Breugel Franck
Publication venue
Publication date: 01/01/2021
Field of study

The probabilistic bisimilarity distance of Deng et al. has been proposed as a robust quantitative generalization of Segala and Lynch's probabilistic bisimilarity for probabilistic automata. In this paper, we present a characterization of the bisimilarity distance as the solution of a simple stochastic game. The characterization gives us an algorithm to compute the distances by applying Condon's simple policy iteration on these games. The correctness of Condon's approach, however, relies on the assumption that the games are stopping. Our games may be non-stopping in general, yet we are able to prove termination for this extended class of games. Already other algorithms have been proposed in the literature to compute these distances, with complexity in

\textbf{UP} \cap \textbf{coUP}

and \textbf{PPAD}. Despite the theoretical relevance, these algorithms are inefficient in practice. To the best of our knowledge, our algorithm is the first practical solution. The characterization of the probabilistic bisimilarity distance mentioned above crucially uses a dual presentation of the Hausdorff distance due to M\'emoli. As an additional contribution, in this paper we show that M\'emoli's result can be used also to prove that the bisimilarity distance bounds the difference in the maximal (or minimal) probability of two states to satisfying arbitrary

\omega

-regular properties, expressed, eg., as LTL formulas

arXiv.org e-Print Archive

University of Strathclyde Institutional Repository

Episciences.org

Oxford University Research Archive

VBN