3 research outputs found

    Adaptive dynamic programming with eligibility traces and complexity reduction of high-dimensional systems

    Get PDF
    This dissertation investigates the application of a variety of computational intelligence techniques, particularly clustering and adaptive dynamic programming (ADP) designs especially heuristic dynamic programming (HDP) and dual heuristic programming (DHP). Moreover, a one-step temporal-difference (TD(0)) and n-step TD (TD(位)) with their gradients are utilized as learning algorithms to train and online-adapt the families of ADP. The dissertation is organized into seven papers. The first paper demonstrates the robustness of model order reduction (MOR) for simulating complex dynamical systems. Agglomerative hierarchical clustering based on performance evaluation is introduced for MOR. This method computes the reduced order denominator of the transfer function by clustering system poles in a hierarchical dendrogram. Several numerical examples of reducing techniques are taken from the literature to compare with our work. In the second paper, a HDP is combined with the Dyna algorithm for path planning. The third paper uses DHP with an eligibility trace parameter (位) to track a reference trajectory under uncertainties for a nonholonomic mobile robot by using a first-order Sugeno fuzzy neural network structure for the critic and actor networks. In the fourth and fifth papers, a stability analysis for a model-free action-dependent HDP(位) is demonstrated with batch- and online-implementation learning, respectively. The sixth work combines two different gradient prediction levels of critic networks. In this work, we provide a convergence proofs. The seventh paper develops a two-hybrid recurrent fuzzy neural network structures for both critic and actor networks. They use a novel n-step gradient temporal-difference (gradient of TD(位)) of an advanced ADP algorithm called value-gradient learning (VGL(位)), and convergence proofs are given. Furthermore, the seventh paper is the first to combine the single network adaptive critic with VGL(位). --Abstract, page iv

    Autonomous obstacle avoidance and positioning control of mobile robots using fuzzy neural networks

    Get PDF
    Navigation and obstacle avoidance are important tasks in the research field of au- tonomous mobile robots. The challenge tackled in this work is the navigation of a 4- wheeled car-type robot to a desired parking position while avoiding obstacles on the way. The taken approach to solve this problem is based on neural fuzzy techniques. Earlier works resulted in a controller to navigate the robot in a clear environment. It is extended by considering additional parameters in the training process. The learning method used in this training is dynamic backpropagation. For the obstacle avoidance problem an additional neuro-fuzzy controller is set up and trained. It influences the results from the navigation controller to avoid collisions with objects blocking the path. The controller is trained with dynamic backpropagation and a reinforcement learning algorithm called deep deterministic policy gradient.Tesi

    Mobile Robot Control Based on Hybrid Neuro-Fuzzy Value Gradient Reinforcement Learning

    No full text
    This paper uses value gradient learning (VGL) to track a reference trajectory under uncertainties, by computing the optimal left and right torque values for a nonholonomic mobile robot. VGL is a high-performance algorithm in adaptive dynamic programming (ADP). Here, it is used as a critic function after fitting a first-order Sugeno fuzzy neural network (FNN) structure to critic and actor networks. Moreover, this work handles the impacts of unmodeled bounded disturbances with various friction values. The simulation is introduced to compare two approaches. The first uses an actor network that confirms the ability of the mobile robot dynamic model to follow a desired trajectory. This approach demonstrates a significant enhancement of the robot\u27s capability to absorb unstructured disturbance signals and friction effects. The second type of results use a critic-optimal-control approach, calculating the optimal control signal for the affine dynamic model of the robot. This completely removes the actor network to exploit reduced computational complexity with faster responses. The simulation is introduced to compare both cases
    corecore