1,107 research outputs found
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
Actor-critic (AC) methods have exhibited great empirical success compared
with other reinforcement learning algorithms, where the actor uses the policy
gradient to improve the learning policy and the critic uses temporal difference
learning to estimate the policy gradient. Under the two time-scale learning
rate schedule, the asymptotic convergence of AC has been well studied in the
literature. However, the non-asymptotic convergence and finite sample
complexity of actor-critic methods are largely open. In this work, we provide a
non-asymptotic analysis for two time-scale actor-critic methods under
non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find
a first-order stationary point (i.e., ) of the non-concave performance function
, with sample
complexity. To the best of our knowledge, this is the first work providing
finite-time analysis and sample complexity bound for two time-scale
actor-critic methods.Comment: 45 page
Algorithms for CVaR Optimization in MDPs
In many sequential decision-making problems we may want to manage risk by
minimizing some measure of variability in costs in addition to minimizing a
standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk
measure that addresses some of the shortcomings of the well-known
variance-related risk measures, and because of its computational efficiencies
has gained popularity in finance and operations research. In this paper, we
consider the mean-CVaR optimization problem in MDPs. We first derive a formula
for computing the gradient of this risk-sensitive objective function. We then
devise policy gradient and actor-critic algorithms that each uses a specific
method to estimate this gradient and updates the policy parameters in the
descent direction. We establish the convergence of our algorithms to locally
risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our
algorithms in an optimal stopping problem.Comment: Submitted to NIPS 1
Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms
Actor Critic methods have found immense applications on a wide range of
Reinforcement Learning tasks especially when the state-action space is large.
In this paper, we consider actor critic and natural actor critic algorithms
with function approximation for constrained Markov decision processes (C-MDP)
involving inequality constraints and carry out a non-asymptotic analysis for
both of these algorithms in a non-i.i.d (Markovian) setting. We consider the
long-run average cost criterion where both the objective and the constraint
functions are suitable policy-dependent long-run averages of certain prescribed
cost functions. We handle the inequality constraints using the Lagrange
multiplier method. We prove that these algorithms are guaranteed to find a
first-order stationary point (i.e., ) of the performance (Lagrange) function , with
a sample complexity of in the case of
both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic
(C-NAC) algorithms.We also show the results of experiments on a few different
grid world settings and observe good empirical performance using both of these
algorithms. In particular, for large grid sizes, Constrained Natural Actor
Critic shows slightly better results than Constrained Actor Critic while the
latter is slightly better for a small grid size
- …