145 research outputs found
Reinforcement Learning for Mixed-Integer Problems Based on MPC
Model Predictive Control has been recently proposed as policy approximation
for Reinforcement Learning, offering a path towards safe and explainable
Reinforcement Learning. This approach has been investigated for Q-learning and
actor-critic methods, both in the context of nominal Economic MPC and Robust
(N)MPC, showing very promising results. In that context, actor-critic methods
seem to be the most reliable approach. Many applications include a mixture of
continuous and integer inputs, for which the classical actor-critic methods
need to be adapted. In this paper, we present a policy approximation based on
mixed-integer MPC schemes, and propose a computationally inexpensive technique
to generate exploration in the mixed-integer input space that ensures a
satisfaction of the constraints. We then propose a simple compatible advantage
function approximation for the proposed policy, that allows one to build the
gradient of the mixed-integer MPC-based policy.Comment: Accepted at IFAC 202
Recursive Feasibility of Stochastic Model Predictive Control with Mission-Wide Probabilistic Constraints
This paper is concerned with solving chance-constrained finite-horizon
optimal control problems, with a particular focus on the recursive feasibility
issue of stochastic model predictive control (SMPC) in terms of mission-wide
probability of safety (MWPS). MWPS assesses the probability that the entire
state trajectory lies within the constraint set, and the objective of the SMPC
controller is to ensure that it is no less than a threshold value. This differs
from classic SMPC where the probability that the state lies in the constraint
set is enforced independently at each time instant. Unlike robust MPC, where
strict recursive feasibility is satisfied by assuming that the uncertainty is
supported by a compact set, the proposed concept of recursive feasibility for
MWPS is based on the notion of remaining MWPSs, which is conserved in the
expected value sense. We demonstrate the idea of mission-wide SMPC in the
linear SMPC case by deploying a scenario-based algorithm
Computing the power profiles for an Airborne Wind Energy system based on large-scale wind data
Airborne Wind Energy (AWE) is a new power technology that harvests wind energy at high altitudes using tethered wings. Studying the power potential of the system at a given location requires evaluating the local power production profile of the AWE system. As the optimal operational AWE system altitude depends on complex trade-offs, a commonly used technique is to formulate the power production computation as an Optimal Control Problem (OCP). In order to obtain an annual power production profile, this OCP has to be solved sequentially for the wind data for each time point. This can be computationally costly due to the highly nonlinear and complex AWE system model. This paper proposes a method how to reduce the computational effort when using an OCP for power computations of large-scale wind data. The method is based on homotopy-path-following strategies, which make use of the similarities between successively solved OCPs. Additionally, different machine learning regression models are evaluated to accurately predict the power production in the case of very large data sets. The methods are illustrated by computing a three-month power profile for an AWE drag-mode system. A significant reduction in computation time is observed, while maintaining good accuracy
Numerical Strategies for Mixed-Integer Optimization of Power-Split and Gear Selection in Hybrid Electric Vehicles
This paper presents numerical strategies for a computationally efficient energy management system that co-optimizes the power split and gear selection of a hybrid electric vehicle (HEV). We formulate a mixed-integer optimal control problem (MIOCP) that is transcribed using multiple-shooting into a mixed-integer nonlinear program (MINLP) and then solved by nonlinear model predictive control. We present two different numerical strategies, a Selective Relaxation Approach (SRA), which decomposes the MINLP into several subproblems, and a Round-n-Search Approach (RSA), which is an enhancement of the known ‘relax-n-round’ strategy. Subsequently, the resulting algorithmic performance and optimality of the solution of the proposed strategies are analyzed against two benchmark strategies; one using rule-based gear selection, which is typically used in production vehicles, and the other using dynamic programming (DP), which provides a global optimum of a quantized version of the MINLP. The results show that both SRA and RSA enable about\ua03.6%\ua0cost reduction compared to the rule-based strategy, while still being within\ua01%\ua0of the DP solution. Moreover, for the case studied RSA takes about\ua035%\ua0less mean computation time compared to SRA, while both SRA and RSA being about\ua099\ua0times faster than DP. Furthermore, both SRA and RSA were able to overcome the infeasibilities encountered by a typical rounding strategy under different drive cycles. The results show the computational benefit of the proposed strategies, as well as the energy saving possibility of co-optimization strategies in which actuator dynamics are explicitly included
Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?
For all its successes, Reinforcement Learning (RL) still struggles to deliver
formal guarantees on the closed-loop behavior of the learned policy. Among
other things, guaranteeing the safety of RL with respect to safety-critical
systems is a very active research topic. Some recent contributions propose to
rely on projections of the inputs delivered by the learned policy into a safe
set, ensuring that the system safety is never jeopardized. Unfortunately, it is
unclear whether this operation can be performed without disrupting the learning
process. This paper addresses this issue. The problem is analysed in the
context of -learning and policy gradient techniques. We show that the
projection approach is generally disruptive in the context of -learning
though a simple alternative solves the issue, while simple corrections can be
used in the context of policy gradient methods in order to ensure that the
policy gradients are unbiased. The proposed results extend to safe projections
based on robust MPC techniques.Comment: Accepted at IFAC 202
- …