1,178 research outputs found

    Reinforcement Learning Neural-Network-Based Controller for Nonlinear Discrete-Time Systems with Input Constraints

    Get PDF
    A novel adaptive-critic-based neural network (NN) controller in discrete time is designed to deliver a desired tracking performance for a class of nonlinear systems in the presence of actuator constraints. The constraints of the actuator are treated in the controller design as the saturation nonlinearity. The adaptive critic NN controller architecture based on state feedback includes two NNs: the critic NN is used to approximate the strategic utility function, whereas the action NN is employed to minimize both the strategic utility function and the unknown nonlinear dynamic estimation errors. The critic and action NN weight updates are derived by minimizing certain quadratic performance indexes. Using the Lyapunov approach and with novel weight updates, the uniformly ultimate boundedness of the closed-loop tracking error and weight estimates is shown in the presence of NN approximation errors and bounded unknown disturbances. The proposed NN controller works in the presence of multiple nonlinearities, unlike other schemes that normally approximate one nonlinearity. Moreover, the adaptive critic NN controller does not require an explicit offline training phase, and the NN weights can be initialized at zero or random. Simulation results justify the theoretical analysi

    Domain Adaptation in Unmanned Aerial Vehicles Landing using Reinforcement Learning

    Get PDF
    Landing an unmanned aerial vehicle (UAV) on a moving platform is a challenging task that often requires exact models of the UAV dynamics, platform characteristics, and environmental conditions. In this thesis, we present and investigate three different machine learning approaches with varying levels of domain knowledge: dynamics randomization, universal policy with system identification, and reinforcement learning with no parameter variation. We first train the policies in simulation, then perform experiments both in simulation, making variations of the system dynamics with wind and friction coefficient, then perform experiments in a real robot system with wind variation. We initially expected that providing more information on environmental characteristics with system identification would improve the outcomes, however, we found that transferring a policy learned in simulation with domain randomization to the real robot system achieves the best result in the real robot and simulation. Although in simulation the universal policy with system identification is faster in some cases. In this thesis, we compare the results of multiple deep reinforcement learning approaches trained in simulation and transferred in robot experiments with the presence of external disturbances. We were able to create a policy to control a UAV completely trained in simulation and transfer to a real system with the presence of external disturbances. In doing so, we evaluate the performance of dynamics randomization and universal policy with system identification. Adviser: Carrick Detweile

    Adaptive Critic Neural Network-Based Object Grasping Control using a Three-finger Gripper

    Get PDF
    Grasping of objects has been a challenging task for robots. The complex grasping task can be defined as object contact control and manipulation subtasks. In this paper, object contact control subtask is defined as the ability to follow a trajectory accurately by the fingers of a gripper. The object manipulation subtask is defined in terms of maintaining a predefined applied force by the fingers on the object. A sophisticated controller is necessary since the process of grasping an object without a priori knowledge of the object\u27s size, texture, softness, gripper, and contact dynamics is rather difficult. Moreover, the object has to be secured accurately and considerably fast without damaging it. Since the gripper, contact dynamics, and the object properties are not typically known beforehand, an adaptive critic neural network (NN)-based hybrid position/force control scheme is introduced. The feedforward action generating NN in the adaptive critic NN controller compensates the nonlinear gripper and contact dynamics. The learning of the action generating NN is performed on-line based on a critic NN output signal. The controller ensures that a three-finger gripper tracks a desired trajectory while applying desired forces on the object for manipulation. Novel NN weight tuning updates are derived for the action generating and critic NNs so that Lyapunov-based stability analysis can be shown. Simulation results demonstrate that the proposed scheme successfully allows fingers of a gripper to secure objects without the knowledge of the underlying gripper and contact dynamics of the object compared to conventional schemes

    Neural-Network-Based State Feedback Control of a Nonlinear Discrete-Time System in Nonstrict Feedback Form

    Get PDF
    In this paper, a suite of adaptive neural network (NN) controllers is designed to deliver a desired tracking performance for the control of an unknown, second-order, nonlinear discrete-time system expressed in nonstrict feedback form. In the first approach, two feedforward NNs are employed in the controller with tracking error as the feedback variable whereas in the adaptive critic NN architecture, three feedforward NNs are used. In the adaptive critic architecture, two action NNs produce virtual and actual control inputs, respectively, whereas the third critic NN approximates certain strategic utility function and its output is employed for tuning action NN weights in order to attain the near-optimal control action. Both the NN control methods present a well-defined controller design and the noncausal problem in discrete-time backstepping design is avoided via NN approximation. A comparison between the controller methodologies is highlighted. The stability analysis of the closed-loop control schemes is demonstrated. The NN controller schemes do not require an offline learning phase and the NN weights can be initialized at zero or random. Results show that the performance of the proposed controller schemes is highly satisfactory while meeting the closed-loop stability
    • …
    corecore