11 research outputs found
Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning
This work tackles the problem of robust zero-shot planning in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously with a bounded evolution rate; 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. 1) we define a specific class of MDPs that we call Non-Stationary MDPs (NSMDPs). We introduce the notion of regular evolution by making an hypothesis of Lipschitz-Continuity on the transition and reward functions w.r.t. time; 2) we consider a planning agent using the current model of the environment but unaware of its future evolution. This leads us to consider a worst-case method where the environment is seen as an adversarial agent; 3) following this approach, we propose the Risk-Averse Tree-Search (RATS) algorithm, a zero-shot Model-Based method similar to Minimax search; 4) we illustrate the benefits brought by RATS empirically and compare its performance with reference Model-Based algorithms
Empirical evaluation of a Q-Learning Algorithm for Model-free Autonomous Soaring
Autonomous unpowered flight is a challenge for control and guidance systems: all the energy the aircraft might use during flight has to be harvested directly from the atmosphere. We investigate the design of an algorithm that optimizes the closed-loop control of a glider's bank and sideslip angles, while flying in the lower convective layer of the atmosphere in order to increase its mission endurance. Using a Reinforcement Learning approach, we demonstrate the possibility for real-time adaptation of the glider's behaviour to the time-varying and noisy conditions associated with thermal soaring flight. Our approach is online, data-based and model-free, hence avoids the pitfalls of aerological and aircraft modelling and allow us to deal with uncertainties and non-stationarity. Additionally, we put a particular emphasis on keeping low computational requirements in order to make on-board execution feasible. This article presents the stochastic, time-dependent aerological model used for simulation, together with a standard aircraft model. Then we introduce an adaptation of a Q-learning algorithm and demonstrate its ability to control the aircraft and improve its endurance by exploiting updrafts in non-stationary scenarios
Lipschitz Lifelong Reinforcement Learning
We consider the problem of knowledge transfer when an agent is facing a
series of Reinforcement Learning (RL) tasks. We introduce a novel metric
between Markov Decision Processes (MDPs) and establish that close MDPs have
close optimal value functions. Formally, the optimal value functions are
Lipschitz continuous with respect to the tasks space. These theoretical results
lead us to a value-transfer method for Lifelong RL, which we use to build a
PAC-MDP algorithm with improved convergence rate. Further, we show the method
to experience no negative transfer with high probability. We illustrate the
benefits of the method in Lifelong RL experiments.Comment: In proceedings of the 35th AAAI Conference on Artificial Intelligence
(AAAI 2021), 21 pages, 11 figure
Open Loop Execution of Tree-Search Algorithms
In the context of tree-search stochastic planning algorithms where a generative model is available, we consider on-line planning algorithms building trees in order to recommend an action. We investigate the question of avoiding re-planning in subsequent decision steps by directly using sub-trees as action recommender. Firstly, we propose a method for open loop control via a new algorithm taking the decision of re-planning or not at each time step based on an analysis of the statistics of the sub-tree. Secondly, we show that the probability of selecting a suboptimal action at any depth of the tree can be upper bounded and converges towards zero. Moreover, this upper bound decays in a logarithmic way between subsequent depths. This leads to a distinction between node-wise optimality and state-wise optimality. Finally, we empirically demonstrate that our method achieves a compromise between loss of performance and computational gain
Apprentissage par renforcement en environnement non stationnaire
How should an agent act in the face of uncertainty on the evolution of its environment? In this dissertation, we give a Reinforcement Learning perspective on the resolution of nonstationary problems. The question is seen from three different aspects. First, we study the planning vs. re-planning trade-off of tree search algorithms in stationary Markov Decision Processes. We propose a method to lower the computational requirements of such an algorithm while keeping theoretical guarantees on the performance. Secondly, we study the case of environments evolving gradually over time. This hypothesis is expressed through a mathematical framework called Lipschitz Non-Stationary Markov Decision Processes. We derive a risk averse planning algorithm provably converging to the minimax policy in this setting. Thirdly, we consider abrupt temporal evolution in the setting of lifelong Reinforcement Learning. We propose a non-negative transfer method based on the theoretical study of the optimal Q-function’s Lipschitz continuity with respect to the task space. The approach allows to accelerate learning in new tasks. Overall, this dissertation proposes answers to the question of solving Non-Stationary Markov Decision Processes under three different settings.Comment un agent doit-il agir étant donné que son environnement évolue de manière incertaine ? Dans cette thèse, nous fournissons une réponse à cette question du point de vue de l’apprentissage par renforcement. Le problème est vu sous trois aspects différents. Premièrement, nous étudions le compromis planification vs. re-planification des algorithmes de recherche arborescente dans les Processus Décisionnels Markoviens. Nous proposons une méthode pour réduire la complexité de calcul d’un tel algorithme, tout en conservant des guaranties théoriques sur la performance. Deuxièmement, nous étudions le cas des environnements évoluant graduellement au cours du temps. Cette hypothèse est formulée dans un cadre mathématique appelé Processus de Décision Markoviens Non-Stationnaires Lipschitziens.Dans ce cadre, nous proposons un algorithme de planification robuste aux évolutions possibles, dont nous montrons qu’il converge vers la politique minmax. Troisièmement, nous considérons le cas de l’évolution temporelle abrupte dans le cadre du “lifelong learning” (apprentissage tout au long de la vie). Nous proposons une méthode de transfert non-négatif basée sur l’étude théorique de la continuité de Lipschitz de la Q-fonction optimale par rapport à l’espace des tâches. L’approche permet d’accélérer l’apprentissage dans de nouvelles tâches. Dans l’ensemble, cette dissertation propose des réponses à la question de la résolution des Processus de Décision Markoviens Non-Stationnaires dans trois cadres d’hypothèses
Open Loop Execution of Tree-Search Algorithms
National audienceIn the context of tree-search stochastic planning algorithms where a generative model is available, we consider on-line planning algorithms building trees in order to recommend an action. We investigate the question of avoiding re-planning in subsequent decision steps by directly using the sub-tree as an action recommender. Firstly, we propose a method for open loop control via a new algorithm taking the decision of re-planning or not at each time step based on an analysis of the statistics of the sub-tree. Secondly , we show that the probability of selecting a subopti-mal action at any depth of the tree can be upper bounded and converges towards zero. Moreover, this upper bound decays in a logarithmic way between subsequent depths. This leads to a distinction between node-wise optimality and state-wise optimality. Finally, we empirically demonstrate that our method achieves a compromise between loss of performance and computational gain
Desmosomal gene analysis in arrhythmogenic right ventricular dysplasia/cardiomyopathy: spectrum of mutations and clinical impact in practice.
International audienceAIMS: Five desmosomal genes have been recently implicated in arrhythmogenic right ventricular dysplasia/cardiomyopathy (ARVD/C) but the clinical impact of genetics remains poorly understood. We wanted to address the potential impact of genotyping. METHODS AND RESULTS: Direct sequencing of the five genes (JUP, DSP, PKP2, DSG2, and DSC2) was performed in 135 unrelated patients with ARVD/C. We identified 41 different disease-causing mutations, including 28 novel ones, in 62 patients (46%). In addition, a genetic variant of unknown significance was identified in nine additional patients (7%). Distribution of genes was 31% (PKP2), 10% (DSG2), 4.5% (DSP), 1.5% (DSC2), and 0% (JUP). The presence of desmosomal mutations was not associated with familial context but was associated with young age, symptoms, electrical substrate, and extensive structural damage. When compared with other genes, DSG2 mutations were associated with more frequent left ventricular involvement (P = 0.006). Finally, complex genetic status with multiple mutations was identified in 4% of patients and was associated with more frequent sudden death (P = 0.047). CONCLUSION: This study supports the use of genetic testing as a new diagnostic tool in ARVC/D and also suggests a prognostic impact, as the severity of the disease appears different according to the underlying gene or the presence of multiple mutations