Search CORE

24,062 research outputs found

Reinforcement Learning Based on Real-Time Iteration NMPC

Author: Gros Sébastien
Kungurtsev Vyacheslav
Zanon Mario
Publication venue
Publication date: 01/01/2020
Field of study

Reinforcement Learning (RL) has proven a stunning ability to learn optimal policies from data without any prior knowledge on the process. The main drawback of RL is that it is typically very difficult to guarantee stability and safety. On the other hand, Nonlinear Model Predictive Control (NMPC) is an advanced model-based control technique which does guarantee safety and stability, but only yields optimality for the nominal model. Therefore, it has been recently proposed to use NMPC as a function approximator within RL. While the ability of this approach to yield good performance has been demonstrated, the main drawback hindering its applicability is related to the computational burden of NMPC, which has to be solved to full convergence. In practice, however, computationally efficient algorithms such as the Real-Time Iteration (RTI) scheme are deployed in order to return an approximate NMPC solution in very short time. In this paper we bridge this gap by extending the existing theoretical framework to also cover RL based on RTI NMPC. We demonstrate the effectiveness of this new RL approach with a nontrivial example modeling a challenging nonlinear system subject to stochastic perturbations with the objective of optimizing an economic cost.Comment: accepted for the IFAC World Congress 202

arXiv.org e-Print Archive

Archivio della ricerca della Scuola IMT Alti Studi Lucca

Monetary policy and rejections of the expectations hypothesis

Author: Ravenna Federico
Seppälä Juha
Publication venue
Publication date
Field of study

We study the rejection of the expectations hypothesis within a New Keynesian business cycle model. Earlier research has shown that the Lucas general equilibrium asset pricing model can account for neither sign nor magnitude of average risk premia in forward prices, and is unable to explain rejection of the expectations hypothesis. We show that a New Keynesian model with habit-formation preferences and a monetary policy feedback rule produces an upward-sloping average term structure of interest rates, procyclical interest rates, and countercyclical term spreads. In the model, as in U.S. data, inverted term structure predicts recessions. Most importantly, a New Keynesian model is able to account for rejections of the expectations hypothesis. Contrary to earlier work, we identify systematic monetary policy as a key factor behind this result. Rejection of the expectation hypothesis can be entirely explained by the volatility of just two real shocks which affect technology and preferences.term structure of interest rates; monetary policy; sticky prices; habit formation; expectations hypothesis

Research Papers in Economics

Data-driven Economic NMPC using Reinforcement Learning

Author: Gros Sébastien
Zanon Mario
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/04/2019
Field of study

Reinforcement Learning (RL) is a powerful tool to perform data-driven optimal control without relying on a model of the system. However, RL struggles to provide hard guarantees on the behavior of the resulting control scheme. In contrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC) are standard tools for the closed-loop optimal control of complex systems with constraints and limitations, and benefit from a rich theory to assess their closed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on the quality of the model underlying the control scheme. In this paper, we show that an (E)NMPC scheme can be tuned to deliver the optimal policy of the real system even when using a wrong model. This result also holds for real systems having stochastic dynamics. This entails that ENMPC can be used as a new type of function approximator within RL. Furthermore, we investigate our results in the context of ENMPC and formally connect them to the concept of dissipativity, which is central for the ENMPC stability. Finally, we detail how these results can be used to deploy classic RL tools for tuning (E)NMPC schemes. We apply these tools on both a classical linear MPC setting and a standard nonlinear example from the ENMPC literature

arXiv.org e-Print Archive

Archivio della ricerca della Scuola IMT Alti Studi Lucca

Causally Regularized Learning with Agnostic Data Selection Bias

Author: Csurka Gabriella
Dos Reis Virgile Landeiro
Lechner Michael
Li Da
Long Mingsheng
Long Mingsheng
Pearl Judea
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/08/2018
Field of study

Most of previous machine learning algorithms are proposed based on the i.i.d. hypothesis. However, this ideal assumption is often violated in real applications, where selection bias may arise between training and testing process. Moreover, in many scenarios, the testing data is not even available during the training process, which makes the traditional methods like transfer learning infeasible due to their need on prior of test distribution. Therefore, how to address the agnostic selection bias for robust model learning is of paramount importance for both academic research and real applications. In this paper, under the assumption that causal relationships among variables are robust across domains, we incorporate causal technique into predictive modeling and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm by jointly optimize global confounder balancing and weighted logistic regression. Global confounder balancing helps to identify causal features, whose causal effect on outcome are stable across domains, then performing logistic regression on those causal features constructs a robust predictive model against the agnostic bias. To validate the effectiveness of our CRLR algorithm, we conduct comprehensive experiments on both synthetic and real world datasets. Experimental results clearly demonstrate that our CRLR algorithm outperforms the state-of-the-art methods, and the interpretability of our method can be fully depicted by the feature visualization.Comment: Oral paper of 2018 ACM Multimedia Conference (MM'18

arXiv.org e-Print Archive

Crossref