1,800 research outputs found
Differential Dynamic Programming for time-delayed systems
Trajectory optimization considers the problem of deciding how to control a
dynamical system to move along a trajectory which minimizes some cost function.
Differential Dynamic Programming (DDP) is an optimal control method which
utilizes a second-order approximation of the problem to find the control. It is
fast enough to allow real-time control and has been shown to work well for
trajectory optimization in robotic systems. Here we extend classic DDP to
systems with multiple time-delays in the state. Being able to find optimal
trajectories for time-delayed systems with DDP opens up the possibility to use
richer models for system identification and control, including recurrent neural
networks with multiple timesteps in the state. We demonstrate the algorithm on
a two-tank continuous stirred tank reactor. We also demonstrate the algorithm
on a recurrent neural network trained to model an inverted pendulum with
position information only.Comment: 7 pages, 6 figures, conference, Decision and Control (CDC), 2016 IEEE
55th Conference o
Black-Box Data-efficient Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning (RL) in
robotics are based on uncertain dynamical models: after each episode, they
first learn a dynamical model of the robot, then they use an optimization
algorithm to find a policy that maximizes the expected return given the model
and its uncertainties. It is often believed that this optimization can be
tractable only if analytical, gradient-based algorithms are used; however,
these algorithms require using specific families of reward functions and
policies, which greatly limits the flexibility of the overall approach. In this
paper, we introduce a novel model-based RL algorithm, called Black-DROPS
(Black-box Data-efficient RObot Policy Search) that: (1) does not impose any
constraint on the reward function or the policy (they are treated as
black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for
data-efficient RL in robotics, and (3) is as fast (or faster) than analytical
approaches when several cores are available. The key idea is to replace the
gradient-based optimization algorithm with a parallel, black-box algorithm that
takes into account the model uncertainties. We demonstrate the performance of
our new algorithm on two standard control benchmark problems (in simulation)
and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS) 2017; Code at
http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP
Robust and Resilient State Dependent Control of Discrete-Time Nonlinear Systems with General Performance Criteria
A novel state dependent control approach for discrete-time nonlinear systems with general performance criteria is presented. This controller is robust for unstructured model uncertainties, resilient against bounded feedback control gain perturbations in achieving optimality for general performance criteria to secure quadratic optimality with inherent asymptotic stability property together with quadratic dissipative type of disturbance reduction. For the system model, unstructured uncertainty description is assumed, which incorporates commonly used types of uncertainties, such as norm-bounded and positive real uncertainties as special cases. By solving a state dependent linear matrix inequality at each time step, sufficient condition for the control solution can be found which satisfies the general performance criteria. The results of this paper unify existing results on nonlinear quadratic regulator, H∞ and positive real control to provide a novel robust control design. The effectiveness of the proposed technique is demonstrated by simulation of the control of inverted pendulum
OPTIMIZATION OF PID CONTROLLER FOR INVERTED PENDULUM SYSTEM USING GENETIC ALGORITHM
The proportional-integral-derivative controller or commonly known as PID
Controller has been widely used in the industries since the 1940s and remains the
most often used today. In this project, PID Controller of an Inverted Pendulum
System is optimized using Genetic Algorithms (GA) approach. Currently, the
Inverted Pendulum System as available in the laboratory is controlled by PID
Controller. However, ample time is required by the pendulum to change its position
from downward to upright and to be stabilized. Therefore, GA will be applied to
overcome this problem. The main objective of this project is to find the optimum
stable point, which is the optimum value ofKP, K} andKD of the PIDController using
GA approach. The second objective of this project is to reduce the time required for
the pendulum to be stabilized. In order to complete this project, a few stages need to
be carried out. The stages include problem identification, research on GA, understand
the principle of PID Controller and Inverted Pendulum, obtain stable region, create
GA coding via MATLAB and conduct test on the real Inverted Pendulum System.
Before optimization technique using GA can be applied, the stable region for the
desired system needs to be obtained first. In this project, Nyquist Stability Criterion is
utilized to obtain the stable region. Once the stable region is obtain, GA is then been
applied where the optimum value of KP, Ki and KD within the stable region are
determined. For this project, MATLAB Software and Double Inverted Pendulum
Trainer are required. Therefore, the understanding on those software and hardware
are vital
Learning a Unified Control Policy for Safe Falling
Being able to fall safely is a necessary motor skill for humanoids performing
highly dynamic tasks, such as running and jumping. We propose a new method to
learn a policy that minimizes the maximal impulse during the fall. The
optimization solves for both a discrete contact planning problem and a
continuous optimal control problem. Once trained, the policy can compute the
optimal next contacting body part (e.g. left foot, right foot, or hands),
contact location and timing, and the required joint actuation. We represent the
policy as a mixture of actor-critic neural network, which consists of n control
policies and the corresponding value functions. Each pair of actor-critic is
associated with one of the n possible contacting body parts. During execution,
the policy corresponding to the highest value function will be executed while
the associated body part will be the next contact with the ground. With this
mixture of actor-critic architecture, the discrete contact sequence planning is
solved through the selection of the best critics while the continuous control
problem is solved by the optimization of actors. We show that our policy can
achieve comparable, sometimes even higher, rewards than a recursive search of
the action space using dynamic programming, while enjoying 50 to 400 times of
speed gain during online execution
End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks
Reinforcement Learning (RL) algorithms have found limited success beyond
simulated applications, and one main reason is the absence of safety guarantees
during the learning process. Real world systems would realistically fail or
break before an optimal controller can be learned. To address this issue, we
propose a controller architecture that combines (1) a model-free RL-based
controller with (2) model-based controllers utilizing control barrier functions
(CBFs) and (3) on-line learning of the unknown system dynamics, in order to
ensure safety during learning. Our general framework leverages the success of
RL algorithms to learn high-performance controllers, while the CBF-based
controllers both guarantee safety and guide the learning process by
constraining the set of explorable polices. We utilize Gaussian Processes (GPs)
to model the system dynamics and its uncertainties.
Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high
probability during the learning process, regardless of the RL algorithm used,
and demonstrates greater policy exploration efficiency. We test our algorithm
on (1) control of an inverted pendulum and (2) autonomous car-following with
wireless vehicle-to-vehicle communication, and show that our algorithm attains
much greater sample efficiency in learning than other state-of-the-art
algorithms and maintains safety during the entire learning process.Comment: Published in AAAI 201
Development of a Genetic Fuzzy Controller and Its Application to a Noisy Inverted Double Pendulum
Fuzzy logic is used in a variety of applications due to its universal approximator attribute and non-linear characteristics. The tuning of the parameters of a fuzzy logic system, viz. the membership functions and the rulebase, requires a lot of trial and error. This process could be simplified by using a heuristic search algorithm like genetic algorithm (GA). In this chapter, we discuss the design of such a genetic fuzzy controller that can control an inverted double pendulum. GA improves the fuzzy logic controller (FLC) with each generation during the training process to obtain an FLC that can bring the pendulum to its inverted position. After training, the effectiveness of the FLC is tested for different scenarios by varying the initial conditions. We also show the effectiveness of the FLC even when subjected to noise and how the performance improves when the controller is tuned with noise
- …