10,889 research outputs found
Output Feedback Speed Control for a Wankel Rotary Engine via Q-Learning
This paper develops a dynamic output feedback controller based on continuous-time Q-learning for the engine speed regulation problem. The proposed controller is able to learn the optimal control solution online in a finite time using only the measurable outputs. We first present the mean value engine model (MVEM) for a Wankel rotary engine. The regulation of engine speed can be formulated as an optimal control problem that minimises a pre-defined value function by actuating the electronic throttle. By parameterising an action-dependent Q-function, we derive a full-state adaptive optimal feedback controller using the idea of continuous-time Q-learning. The adaptive critic approximates the Q-function as a neural network and directly updates the actor, where the convergence is guaranteed by employing novel finite-time adaptation techniques. Then, we incorporate the extended Kalman filter (EKF) as an optimal reduced-order state observer, which enables the online estimation of the unknown fuel puddle dynamics, to achieve a dynamic output feedback engine speed controller. The simulation results of a benchmark 225CS engine demonstrate that the proposed controller can effectively regulate the engine speed to a set point under certain load disturbances
Connections Between Adaptive Control and Optimization in Machine Learning
This paper demonstrates many immediate connections between adaptive control
and optimization methods commonly employed in machine learning. Starting from
common output error formulations, similarities in update law modifications are
examined. Concepts in stability, performance, and learning, common to both
fields are then discussed. Building on the similarities in update laws and
common concepts, new intersections and opportunities for improved algorithm
analysis are provided. In particular, a specific problem related to higher
order learning is solved through insights obtained from these intersections.Comment: 18 page
Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems
Many modern nonlinear control methods aim to endow systems with guaranteed
properties, such as stability or safety, and have been successfully applied to
the domain of robotics. However, model uncertainty remains a persistent
challenge, weakening theoretical guarantees and causing implementation failures
on physical systems. This paper develops a machine learning framework centered
around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and
unmodeled dynamics in general robotic systems. Our proposed method proceeds by
iteratively updating estimates of Lyapunov function derivatives and improving
controllers, ultimately yielding a stabilizing quadratic program model-based
controller. We validate our approach on a planar Segway simulation,
demonstrating substantial performance improvements by iteratively refining on a
base model-free controller
Competitive function approximation for reinforcement learning
The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions.
We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin
- …