1 research outputs found

    Intensive versus non-intensive actor-critic reinforcement learning algorithms

    No full text
    Algorithms of reinforcement learning usually employ consecutive agent's actions to construct gradients estimators to adjust agent's policy. The policy is a result of some kind of stochastic approximation
    corecore