10,889 research outputs found

    Output Feedback Speed Control for a Wankel Rotary Engine via Q-Learning

    Get PDF
    This paper develops a dynamic output feedback controller based on continuous-time Q-learning for the engine speed regulation problem. The proposed controller is able to learn the optimal control solution online in a finite time using only the measurable outputs. We first present the mean value engine model (MVEM) for a Wankel rotary engine. The regulation of engine speed can be formulated as an optimal control problem that minimises a pre-defined value function by actuating the electronic throttle. By parameterising an action-dependent Q-function, we derive a full-state adaptive optimal feedback controller using the idea of continuous-time Q-learning. The adaptive critic approximates the Q-function as a neural network and directly updates the actor, where the convergence is guaranteed by employing novel finite-time adaptation techniques. Then, we incorporate the extended Kalman filter (EKF) as an optimal reduced-order state observer, which enables the online estimation of the unknown fuel puddle dynamics, to achieve a dynamic output feedback engine speed controller. The simulation results of a benchmark 225CS engine demonstrate that the proposed controller can effectively regulate the engine speed to a set point under certain load disturbances

    Connections Between Adaptive Control and Optimization in Machine Learning

    Full text link
    This paper demonstrates many immediate connections between adaptive control and optimization methods commonly employed in machine learning. Starting from common output error formulations, similarities in update law modifications are examined. Concepts in stability, performance, and learning, common to both fields are then discussed. Building on the similarities in update laws and common concepts, new intersections and opportunities for improved algorithm analysis are provided. In particular, a specific problem related to higher order learning is solved through insights obtained from these intersections.Comment: 18 page

    Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems

    Get PDF
    Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems. Our proposed method proceeds by iteratively updating estimates of Lyapunov function derivatives and improving controllers, ultimately yielding a stabilizing quadratic program model-based controller. We validate our approach on a planar Segway simulation, demonstrating substantial performance improvements by iteratively refining on a base model-free controller

    Competitive function approximation for reinforcement learning

    Get PDF
    The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions. We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin
    • …
    corecore