1 research outputs found

    2

    No full text
    Abstract We propose a simple framework for critic-based training of recurrent neural networks and feedback controllers. We term the critics that are used primitive adaptive critics, since we represent them with the simplest possible architecture (bias weight only). We derive this framework from two main premises. The first of these is a natural similarity between a form of approximate dynamic programming, called Dual Heuristic Programming (DHP), and backpropagation through time (BPTT), as we will discuss. The second premise is our emphasis on a development of a truly online critic-based training procedure competitive in performance and computational cost to truncated BPTT. Three examples illustrate the main features of the framework proposed. DHP and BPTT A family of designs of approximate dynamic programming in continuous domains has been proposed by Werbos [1]. It includes the following steps. First, the well-known Bellman equation of dynamic programming is written a
    corecore