1,800 research outputs found

    Differential Dynamic Programming for time-delayed systems

    Full text link
    Trajectory optimization considers the problem of deciding how to control a dynamical system to move along a trajectory which minimizes some cost function. Differential Dynamic Programming (DDP) is an optimal control method which utilizes a second-order approximation of the problem to find the control. It is fast enough to allow real-time control and has been shown to work well for trajectory optimization in robotic systems. Here we extend classic DDP to systems with multiple time-delays in the state. Being able to find optimal trajectories for time-delayed systems with DDP opens up the possibility to use richer models for system identification and control, including recurrent neural networks with multiple timesteps in the state. We demonstrate the algorithm on a two-tank continuous stirred tank reactor. We also demonstrate the algorithm on a recurrent neural network trained to model an inverted pendulum with position information only.Comment: 7 pages, 6 figures, conference, Decision and Control (CDC), 2016 IEEE 55th Conference o

    Black-Box Data-efficient Policy Search for Robotics

    Get PDF
    The most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynamical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of reward functions and policies, which greatly limits the flexibility of the overall approach. In this paper, we introduce a novel model-based RL algorithm, called Black-DROPS (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. The key idea is to replace the gradient-based optimization algorithm with a parallel, black-box algorithm that takes into account the model uncertainties. We demonstrate the performance of our new algorithm on two standard control benchmark problems (in simulation) and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017; Code at http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP

    Robust and Resilient State Dependent Control of Discrete-Time Nonlinear Systems with General Performance Criteria

    Get PDF
    A novel state dependent control approach for discrete-time nonlinear systems with general performance criteria is presented. This controller is robust for unstructured model uncertainties, resilient against bounded feedback control gain perturbations in achieving optimality for general performance criteria to secure quadratic optimality with inherent asymptotic stability property together with quadratic dissipative type of disturbance reduction. For the system model, unstructured uncertainty description is assumed, which incorporates commonly used types of uncertainties, such as norm-bounded and positive real uncertainties as special cases. By solving a state dependent linear matrix inequality at each time step, sufficient condition for the control solution can be found which satisfies the general performance criteria. The results of this paper unify existing results on nonlinear quadratic regulator, H∞ and positive real control to provide a novel robust control design. The effectiveness of the proposed technique is demonstrated by simulation of the control of inverted pendulum

    OPTIMIZATION OF PID CONTROLLER FOR INVERTED PENDULUM SYSTEM USING GENETIC ALGORITHM

    Get PDF
    The proportional-integral-derivative controller or commonly known as PID Controller has been widely used in the industries since the 1940s and remains the most often used today. In this project, PID Controller of an Inverted Pendulum System is optimized using Genetic Algorithms (GA) approach. Currently, the Inverted Pendulum System as available in the laboratory is controlled by PID Controller. However, ample time is required by the pendulum to change its position from downward to upright and to be stabilized. Therefore, GA will be applied to overcome this problem. The main objective of this project is to find the optimum stable point, which is the optimum value ofKP, K} andKD of the PIDController using GA approach. The second objective of this project is to reduce the time required for the pendulum to be stabilized. In order to complete this project, a few stages need to be carried out. The stages include problem identification, research on GA, understand the principle of PID Controller and Inverted Pendulum, obtain stable region, create GA coding via MATLAB and conduct test on the real Inverted Pendulum System. Before optimization technique using GA can be applied, the stable region for the desired system needs to be obtained first. In this project, Nyquist Stability Criterion is utilized to obtain the stable region. Once the stable region is obtain, GA is then been applied where the optimum value of KP, Ki and KD within the stable region are determined. For this project, MATLAB Software and Double Inverted Pendulum Trainer are required. Therefore, the understanding on those software and hardware are vital

    Learning a Unified Control Policy for Safe Falling

    Full text link
    Being able to fall safely is a necessary motor skill for humanoids performing highly dynamic tasks, such as running and jumping. We propose a new method to learn a policy that minimizes the maximal impulse during the fall. The optimization solves for both a discrete contact planning problem and a continuous optimal control problem. Once trained, the policy can compute the optimal next contacting body part (e.g. left foot, right foot, or hands), contact location and timing, and the required joint actuation. We represent the policy as a mixture of actor-critic neural network, which consists of n control policies and the corresponding value functions. Each pair of actor-critic is associated with one of the n possible contacting body parts. During execution, the policy corresponding to the highest value function will be executed while the associated body part will be the next contact with the ground. With this mixture of actor-critic architecture, the discrete contact sequence planning is solved through the selection of the best critics while the continuous control problem is solved by the optimization of actors. We show that our policy can achieve comparable, sometimes even higher, rewards than a recursive search of the action space using dynamic programming, while enjoying 50 to 400 times of speed gain during online execution

    End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

    Get PDF
    Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.Comment: Published in AAAI 201

    Development of a Genetic Fuzzy Controller and Its Application to a Noisy Inverted Double Pendulum

    Get PDF
    Fuzzy logic is used in a variety of applications due to its universal approximator attribute and non-linear characteristics. The tuning of the parameters of a fuzzy logic system, viz. the membership functions and the rulebase, requires a lot of trial and error. This process could be simplified by using a heuristic search algorithm like genetic algorithm (GA). In this chapter, we discuss the design of such a genetic fuzzy controller that can control an inverted double pendulum. GA improves the fuzzy logic controller (FLC) with each generation during the training process to obtain an FLC that can bring the pendulum to its inverted position. After training, the effectiveness of the FLC is tested for different scenarios by varying the initial conditions. We also show the effectiveness of the FLC even when subjected to noise and how the performance improves when the controller is tuned with noise
    corecore