1 research outputs found
Intensive versus non-intensive actor-critic reinforcement learning algorithms
Algorithms of reinforcement learning usually employ consecutive agent's actions to construct gradients estimators to adjust agent's policy. The policy is a result of some kind of stochastic approximation