4 research outputs found

    Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information

    Get PDF
    In recent years, reinforcement learning (RL) has gained increasing attention in control engineering. Especially, policy gradient methods are widely used. In this work, we improve the tracking performance of proximal policy optimization (PPO) for arbitrary reference signals by incorporating information about future reference values. Two variants of extending the argument of the actor and the critic taking future reference values into account are presented. In the first variant, global future reference values are added to the argument. For the second variant, a novel kind of residual space with future reference values applicable to model-free reinforcement learning is introduced. Our approach is evaluated against a PI controller on a simple drive train model. We expect our method to generalize to arbitrary references better than previous approaches, pointing towards the applicability of RL to control real systems

    Imitation Learning and Direct Perception for Autonomous Driving

    Get PDF
    This thesis presents two learning based approaches to solve the autonomous driving problem: end-to-end imitation learning and direct visual perception. Imitation learning uses expert demonstrations to build a policy that serves as a sensory stimulus to action mapping. During inference, the policy takes in readings from the vehicle's onboard sensors such as cameras, radars, and lidars, and converts them to driving signals. Direct perception on the other hand uses these sensor readings to predict a set of features that define the system's operational state, or affordances, then these affordances are used by a physics based controller to drive the vehicle. To reflect the context specific, multimodal nature of the driving task, these models should be aware of the context, which in this case is driver intention. During development of the imitation learning approach, two methods of conditioning the model were trialed. The first was providing the context as an input to the network, and the second was using a branched model with each branch representing a different context. The branched model showed superior performance, so branching was used to bring context awareness to the direct perception model as well. There were no preexisting datasets to train the direct perception model, so a simulation based data recorder was built to create training data. By creating new data that included lane change behavior, the first direct perception model that includes lane change capabilities was trained. Lastly, a kinematic and a dynamic controller were developed to complete the direct perception pipeline. Both take advantage of having access to road curvature. The kinematic controller has a hybrid feedforward-feedback structure where the road curvature is used as a feedforward term, and lane deviations are used as feedback terms. The dynamic controller is inspired by model predictive control. It iteratively solves for the optimal steering angle to get the vehicle to travel in a path that matches the reference curvature, while also being assisted by lane deviation feedback
    corecore