145 research outputs found

    Reinforcement Learning for Mixed-Integer Problems Based on MPC

    Get PDF
    Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic methods, both in the context of nominal Economic MPC and Robust (N)MPC, showing very promising results. In that context, actor-critic methods seem to be the most reliable approach. Many applications include a mixture of continuous and integer inputs, for which the classical actor-critic methods need to be adapted. In this paper, we present a policy approximation based on mixed-integer MPC schemes, and propose a computationally inexpensive technique to generate exploration in the mixed-integer input space that ensures a satisfaction of the constraints. We then propose a simple compatible advantage function approximation for the proposed policy, that allows one to build the gradient of the mixed-integer MPC-based policy.Comment: Accepted at IFAC 202

    On the Similarity Between Two Popular Tube MPC Formulations

    Get PDF

    Recursive Feasibility of Stochastic Model Predictive Control with Mission-Wide Probabilistic Constraints

    Full text link
    This paper is concerned with solving chance-constrained finite-horizon optimal control problems, with a particular focus on the recursive feasibility issue of stochastic model predictive control (SMPC) in terms of mission-wide probability of safety (MWPS). MWPS assesses the probability that the entire state trajectory lies within the constraint set, and the objective of the SMPC controller is to ensure that it is no less than a threshold value. This differs from classic SMPC where the probability that the state lies in the constraint set is enforced independently at each time instant. Unlike robust MPC, where strict recursive feasibility is satisfied by assuming that the uncertainty is supported by a compact set, the proposed concept of recursive feasibility for MWPS is based on the notion of remaining MWPSs, which is conserved in the expected value sense. We demonstrate the idea of mission-wide SMPC in the linear SMPC case by deploying a scenario-based algorithm

    Computing the power profiles for an Airborne Wind Energy system based on large-scale wind data

    Get PDF
    Airborne Wind Energy (AWE) is a new power technology that harvests wind energy at high altitudes using tethered wings. Studying the power potential of the system at a given location requires evaluating the local power production profile of the AWE system. As the optimal operational AWE system altitude depends on complex trade-offs, a commonly used technique is to formulate the power production computation as an Optimal Control Problem (OCP). In order to obtain an annual power production profile, this OCP has to be solved sequentially for the wind data for each time point. This can be computationally costly due to the highly nonlinear and complex AWE system model. This paper proposes a method how to reduce the computational effort when using an OCP for power computations of large-scale wind data. The method is based on homotopy-path-following strategies, which make use of the similarities between successively solved OCPs. Additionally, different machine learning regression models are evaluated to accurately predict the power production in the case of very large data sets. The methods are illustrated by computing a three-month power profile for an AWE drag-mode system. A significant reduction in computation time is observed, while maintaining good accuracy

    Numerical Strategies for Mixed-Integer Optimization of Power-Split and Gear Selection in Hybrid Electric Vehicles

    Get PDF
    This paper presents numerical strategies for a computationally efficient energy management system that co-optimizes the power split and gear selection of a hybrid electric vehicle (HEV). We formulate a mixed-integer optimal control problem (MIOCP) that is transcribed using multiple-shooting into a mixed-integer nonlinear program (MINLP) and then solved by nonlinear model predictive control. We present two different numerical strategies, a Selective Relaxation Approach (SRA), which decomposes the MINLP into several subproblems, and a Round-n-Search Approach (RSA), which is an enhancement of the known ‘relax-n-round’ strategy. Subsequently, the resulting algorithmic performance and optimality of the solution of the proposed strategies are analyzed against two benchmark strategies; one using rule-based gear selection, which is typically used in production vehicles, and the other using dynamic programming (DP), which provides a global optimum of a quantized version of the MINLP. The results show that both SRA and RSA enable about\ua03.6%\ua0cost reduction compared to the rule-based strategy, while still being within\ua01%\ua0of the DP solution. Moreover, for the case studied RSA takes about\ua035%\ua0less mean computation time compared to SRA, while both SRA and RSA being about\ua099\ua0times faster than DP. Furthermore, both SRA and RSA were able to overcome the infeasibilities encountered by a typical rounding strategy under different drive cycles. The results show the computational benefit of the proposed strategies, as well as the energy saving possibility of co-optimization strategies in which actuator dynamics are explicitly included

    Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?

    Get PDF
    For all its successes, Reinforcement Learning (RL) still struggles to deliver formal guarantees on the closed-loop behavior of the learned policy. Among other things, guaranteeing the safety of RL with respect to safety-critical systems is a very active research topic. Some recent contributions propose to rely on projections of the inputs delivered by the learned policy into a safe set, ensuring that the system safety is never jeopardized. Unfortunately, it is unclear whether this operation can be performed without disrupting the learning process. This paper addresses this issue. The problem is analysed in the context of QQ-learning and policy gradient techniques. We show that the projection approach is generally disruptive in the context of QQ-learning though a simple alternative solves the issue, while simple corrections can be used in the context of policy gradient methods in order to ensure that the policy gradients are unbiased. The proposed results extend to safe projections based on robust MPC techniques.Comment: Accepted at IFAC 202
    • …
    corecore