Search CORE

4 research outputs found

Feature Search in the Grassmanian in Online Reinforcement Learning

Author: Bhatnagar Shalabh
Borkar VS
Prabuchandran KJ
Publication venue: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication date: 01/01/2013
Field of study

We consider the problem of finding the best features for value function approximation in reinforcement learning and develop an online algorithm to optimize the mean square Bellman error objective. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme and, on a slower timescale, also performs gradient search in the Grassman manifold of features. We present a proof of convergence of our algorithm. We show empirical results using our algorithm as well as a similar algorithm that uses temporal difference learning in place of the residual gradient scheme for the faster timescale updates

Open Access Repository of IISc Research Publications

Dspace at IIT Bombay

Actor-Critic Algorithms with Online Feature Adaptation

Author: Bhatnagar Shalabh
Borkar VS
Prabuchandran KJ
Publication venue: ASSOC COMPUTING MACHINERY
Publication date: 01/01/2016
Field of study

We develop two new online actor-critic control algorithms with adaptive feature tuning for Markov Decision Processes (MDPs). One of our algorithms is proposed for the long-run average cost objective, while the other works for discounted cost MDPs. Our actor-critic architecture incorporates parameterization both in the policy and the value function. A gradient search in the policy parameters is performed to improve the performance of the actor. The computation of the aforementioned gradient, however, requires an estimate of the value function of the policy corresponding to the current actor parameter. The value function, on the other hand, is approximated using linear function approximation and obtained from the critic. The error in approximation of the value function, however, results in suboptimal policies. In our article, we also update the features by performing a gradient descent on the Grassmannian of features to minimize a mean square Bellman error objective in order to find the best features. The aim is to obtain a good approximation of the value function and thereby ensure convergence of the actor to locally optimal policies. In order to estimate the gradient of the objective in the case of the average cost criterion, we utilize the policy gradient theorem, while in the case of the discounted cost objective, we utilize the simultaneous perturbation stochastic approximation (SPSA) scheme. We prove that our actor-critic algorithms converge to locally optimal policies. Experiments on two different settings show performance improvements resulting from our feature adaptation scheme

Open Access Repository of IISc Research Publications

Dspace at IIT Bombay

Multi-agent Reinforcement Learning for Traffic Signal Control

Author: Bhatnagar Shalabh
Kumar Hemanth AN
Prabuchandran KJ
Publication venue: IEEE
Publication date
Field of study

Optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users amidst the rapid increase in the usage of vehicles. In this paper, we formulate the TSC problem as a discounted cost Markov decision process (MDP) and apply multi-agent reinforcement learning (MARL) algorithms to obtain dynamic TSC policies. We model each traffic signal junction as an independent agent. An agent decides the signal duration of its phases in a round-robin (RR) manner using multi-agent Q-learning with either is an element of-greedy or UCB 3] based exploration strategies. It updates its Q-factors based on the cost feedback signal received from its neighbouring agents. This feedback signal can be easily constructed and is shown to be effective in minimizing the average delay of the vehicles in the network. We show through simulations over VISSIM that our algorithms perform significantly better than both the standard fixed signal timing (FST) algorithm and the saturation balancing (SAT) algorithm 15] over two real road networks

Open Access Repository of IISc Research Publications

Reinforcement learning algorithm for non-stationary environments

Author: A Cano
A Shiryaev
AS Iwashita
B Krawczyk
BC Csáji
CJ Watkins
D Bertsekas
DS Matteson
ES Page
F Niroui
J Abounadi
KJ Prabuchandran
M Mohammadi
M Roveri
ML Puterman
RS Sutton
S Abdallah
S Ding
T Jaksch
VR Konda
X Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref