Search CORE

7 research outputs found

Large state spaces and self-supervision in reinforcement learning

Author: Touati Ahmed
Publication venue
Publication date: 01/08/2021
Field of study

L'apprentissage par renforcement (RL) est un paradigme d'apprentissage orienté agent qui s'intéresse à l'apprentissage en interagissant avec un environnement incertain. Combiné à des réseaux de neurones profonds comme approximateur de fonction, l'apprentissage par renforcement profond (Deep RL) nous a permis récemment de nous attaquer à des tâches très complexes et de permettre à des agents artificiels de maîtriser des jeux classiques comme le Go, de jouer à des jeux vidéo à partir de pixels et de résoudre des tâches de contrôle robotique. Toutefois, un examen plus approfondi de ces remarquables succès empiriques révèle certaines limites fondamentales. Tout d'abord, il a été difficile de combiner les caractéristiques souhaitables des algorithmes RL, telles que l'apprentissage hors politique et en plusieurs étapes, et l'approximation de fonctions, de manière à obtenir des algorithmes stables et efficaces dans de grands espaces d'états. De plus, les algorithmes RL profonds ont tendance à être très inefficaces en raison des stratégies d'exploration-exploitation rudimentaires que ces approches emploient. Enfin, ils nécessitent une énorme quantité de données supervisées et finissent par produire un agent étroit capable de résoudre uniquement la tâche sur laquelle il est entrainé. Dans cette thèse, nous proposons de nouvelles solutions aux problèmes de l'apprentissage hors politique et du dilemme exploration-exploitation dans les grands espaces d'états, ainsi que de l'auto-supervision dans la RL. En ce qui concerne l'apprentissage hors politique, nous apportons deux contributions. Tout d'abord, pour le problème de l'évaluation des politiques, nous montrons que la combinaison des méthodes populaires d'apprentissage hors politique et à plusieurs étapes avec une paramétrisation linéaire de la fonction de valeur pourrait conduire à une instabilité indésirable, et nous dérivons une variante de ces méthodes dont la convergence est prouvée. Deuxièmement, pour l'optimisation des politiques, nous proposons de stabiliser l'étape d'amélioration des politiques par une régularisation de divergence hors politique qui contraint les distributions stationnaires d'états induites par des politiques consécutives à être proches les unes des autres. Ensuite, nous étudions l'apprentissage en ligne dans de grands espaces d'états et nous nous concentrons sur deux hypothèses structurelles pour rendre le problème traitable : les environnements lisses et linéaires. Pour les environnements lisses, nous proposons un algorithme en ligne efficace qui apprend activement un partitionnement adaptatif de l'espace commun en zoomant sur les régions les plus prometteuses et fréquemment visitées. Pour les environnements linéaires, nous étudions un cadre plus réaliste, où l'environnement peut maintenant évoluer dynamiquement et même de façon antagoniste au fil du temps, mais le changement total est toujours limité. Pour traiter ce cadre, nous proposons un algorithme en ligne efficace basé sur l'itération de valeur des moindres carrés pondérés. Il utilise des poids exponentiels pour oublier doucement les données qui sont loin dans le passé, ce qui pousse l'agent à continuer à explorer pour découvrir les changements. Enfin, au-delà du cadre classique du RL, nous considérons un agent qui interagit avec son environnement sans signal de récompense. Nous proposons d'apprendre une paire de représentations qui mettent en correspondance les paires état-action avec un certain espace latent. Pendant la phase non supervisée, ces représentations sont entraînées en utilisant des interactions sans récompense pour encoder les relations à longue portée entre les états et les actions, via une carte d'occupation prédictive. Au moment du test, lorsqu'une fonction de récompense est révélée, nous montrons que la politique optimale pour cette récompense est directement obtenue à partir de ces représentations, sans aucune planification. Il s'agit d'une étape vers la construction d'agents entièrement contrôlables. Un thème commun de la thèse est la conception d'algorithmes RL prouvables et généralisables. Dans la première et la deuxième partie, nous traitons de la généralisation dans les grands espaces d'états, soit par approximation de fonctions linéaires, soit par agrégation d'états. Dans la dernière partie, nous nous concentrons sur la généralisation sur les fonctions de récompense et nous proposons un cadre d'apprentissage non-supervisé de représentation qui est capable d'optimiser toutes les fonctions de récompense.Reinforcement Learning (RL) is an agent-oriented learning paradigm concerned with learning by interacting with an uncertain environment. Combined with deep neural networks as function approximators, deep reinforcement learning (Deep RL) allowed recently to tackle highly complex tasks and enable artificial agents to master classic games like Go, play video games from pixels, and solve robotic control tasks. However, a closer look at these remarkable empirical successes reveals some fundamental limitations. First, it has been challenging to combine desirable features of RL algorithms, such as off-policy and multi-step learning with function approximation in a way that leads to both stable and efficient algorithms in large state spaces. Moreover, Deep RL algorithms tend to be very sample inefficient due to the rudimentary exploration-exploitation strategies these approaches employ. Finally, they require an enormous amount of supervised data and end up producing a narrow agent able to solve only the task that it was trained on. In this thesis, we propose novel solutions to the problems of off-policy learning and exploration-exploitation dilemma in large state spaces, as well as self-supervision in RL. On the topic of off-policy learning, we provide two contributions. First, for the problem of policy evaluation, we show that combining popular off-policy and multi-step learning methods with linear value function parameterization could lead to undesirable instability, and we derive a provably convergent variant of these methods. Second, for policy optimization, we propose to stabilize the policy improvement step through an off-policy divergence regularization that constrains the discounted state-action visitation induced by consecutive policies to be close to one another. Next, we study online learning in large state spaces and we focus on two structural assumptions to make the problem tractable: smooth and linear environments. For smooth environments, we propose an efficient online algorithm that actively learns an adaptive partitioning of the joint space by zooming in on more promising and frequently visited regions. For linear environments, we study a more realistic setting, where the environment is now allowed to evolve dynamically and even adversarially over time, but the total change is still bounded. To address this setting, we propose an efficient online algorithm based on weighted least squares value iteration. It uses exponential weights to smoothly forget data that are far in the past, which drives the agent to keep exploring to discover changes. Finally, beyond the classical RL setting, we consider an agent interacting with its environments without a reward signal. We propose to learn a pair of representations that map state-action pairs to some latent space. During the unsupervised phase, these representations are trained using reward-free interactions to encode long-range relationships between states and actions, via a predictive occupancy map. At test time, once a reward function is revealed, we show that the optimal policy for that reward is directly obtained from these representations, with no planning. This is a step towards building fully controllable agents. A common theme in the thesis is the design of provable RL algorithms that generalize. In the first and the second part, we deal with generalization in large state spaces either by linear function approximation or state aggregation. In the last part, we focus on generalization over reward functions and we propose a task-agnostic representation learning framework that is provably able to solve all reward functions

Dépôt Institutionnel Numérique

Algorithmes stochastiques d'optimisation sous incertitude sur des structures complexes. Convergence et applications

Author: Gavra Ioana Alexandra
Publication venue
Publication date: 05/10/2017
Field of study

Les principaux sujets étudiés dans cette thèse concernent le développement d'algorithmes stochastiques d'optimisation sous incertitude, l'étude de leurs propriétés théoriques et leurs applications. Les algorithmes proposés sont des variantes du recuit simulé qui n'utilisent que des estimations sans biais de la fonction de coût. On étudie leur convergence en utilisant des outils développés dans la théorie des processus de Markov : on utilise les propriétés du générateur infinitésimal et des inégalités fonctionnelles pour mesurer la distance entre leur distribution et une distribution cible. La première partie est dédiée aux graphes quantiques, munis d'une mesure de probabilité sur l'ensemble des sommets. Les graphes quantiques sont des versions continues de graphes pondérés non-orientés. Le point de départ de cette thèse a été de trouver la moyenne de Fréchet de tels graphes. La moyenne de Fréchet est une extension aux espaces métriques de la moyenne euclidienne et est définie comme étant le point qui minimise la somme des carrés des distances pondérées à tous les sommets. Notre méthode est basée sur une formulation de Langevin d'un recuit simulé bruité et utilise une technique d'homogénéisation. Dans le but d'établir la convergence en probabilité du processus, on étudie l'évolution de l'entropie relative de sa loi par rapport a une mesure de Gibbs bien choisie. En utilisant des inégalités fonctionnelles (Poincaré et Sobolev) et le lemme de Gronwall, on montre ensuite que l'entropie relative tend vers zéro. Notre méthode est testée sur des données réelles et nous proposons une méthode heuristique pour adapter l'algorithme à de très grands graphes, en utilisant un clustering préliminaire. Dans le même cadre, on introduit une définition d'analyse en composantes principales pour un graphe quantique. Ceci implique, une fois de plus, un problème d'optimisation stochastique, cette fois-ci sur l'espace des géodésiques du graphe. Nous présentons un algorithme pour trouver la première composante principale et conjecturons la convergence du processus de Markov associé vers l'ensemble voulu. Dans une deuxième partie, on propose une version modifiée de l'algorithme du recuit simulé pour résoudre un problème d'optimisation stochastique global sur un espace d'états fini. Notre approche est inspirée du domaine général des méthodes Monte-Carlo et repose sur une chaine de Markov dont la probabilité de transition à chaque étape est définie à l'aide de " mini-lots " de taille croissante (aléatoire). On montre la convergence en probabilité de l'algorithme vers l'ensemble optimal, on donne la vitesse de convergence et un choix de paramètres optimisés pour assurer un nombre minimal d'évaluations pour une précision donnée et un intervalle de confiance proche de 1. Ce travail est complété par un ensemble de simulations numériques qui illustrent la performance pratique de notre algorithme à la fois sur des fonctions tests et sur des données réelles issues de cas concrets.The main topics of this thesis involve the development of stochastic algorithms for optimization under uncertainty, the study of their theoretical properties and applications. The proposed algorithms are modified versions of simulated an- nealing that use only unbiased estimators of the cost function. We study their convergence using the tools developed in the theory of Markov processes: we use properties of infinitesimal generators and functional inequalities to measure the distance between their probability law and a target one. The first part is concerned with quantum graphs endowed with a probability measure on their vertex set. Quantum graphs are continuous versions of undirected weighted graphs. The starting point of the present work was the question of finding Fréchet means on such a graph. The Fréchet mean is an extension of the Euclidean mean to general metric spaces and is defined as an element that minimizes the sum of weighted square distances to all vertices. Our method relies on a Langevin formulation of a noisy simulated annealing dealt with using homogenization. In order to establish the convergence in probability of the process, we study the evolution of the relative entropy of its law with respect to a convenient Gibbs measure. Using functional inequalities (Poincare and Sobolev) and Gronwall's Lemma, we then show that the relative entropy goes to zero. We test our method on some real data sets and propose an heuristic method to adapt the algorithm to huge graphs, using a preliminary clustering. In the same framework, we introduce a definition of principal component analysis for quantum graphs. This implies, once more, a stochastic optimization problem, this time on the space of the graph's geodesics. We suggest an algorithm for finding the first principal component and conjecture the convergence of the associated Markov process to the wanted set. On the second part, we propose a modified version of the simulated annealing algorithm for solving a stochastic global optimization problem on a finite space. Our approach is inspired by the general field of Monte Carlo methods and relies on a Markov chain whose probability transition at each step is defined with the help of mini batches of increasing (random) size. We prove the algorithm's convergence in probability towards the optimal set, provide convergence rate and its optimized parametrization to ensure a minimal number of evaluations for a given accuracy and a confidence level close to 1. This work is completed with a set of numerical experiments and the assessment of the practical performance both on benchmark test cases and on real world examples

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

The IPBES regional assessment report on biodiversity and ecosystem services for Europe and Central Asia

Author
Publication venue: Secretariat of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services
Publication date: 01/01/2018
Field of study

The Regional Assessment Report on Biodiversity and Ecosystem Services for Europe and Central Asia produced by the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) provides a critical analysis of the state of knowledge regarding the importance, status, and trends of biodiversity and nature’s contributions to people. The assessment analyses the direct and underlying causes for the observed changes in biodiversity and in nature’s contributions to people, and the impact that these changes have on the quality of life of people. The assessment, finally, identifies a mix of governance options, policies and management practices that are currently available to reduce the loss of biodiversity and of nature’s contributions to people in that region. The assessment addresses terrestrial, freshwater, and coastal biodiversity and covers current status and trends, going back in time several decades, and future projections, with a focus on the 2020-2050 period

Bern Open Repository and Information System (BORIS)

Kant in English: An Index

Author: Hosik Sohn (3983207)
Yong Ju Jung (3983210)
Yong Man Ro (3983213)
Publication venue
Publication date: 01/01/2017
Field of study

Kant in English: An Index / By Daniel Fidel Ferrer. ©Daniel Fidel Ferrer, 2017. Pages 1 to 2675. Includes bibliographical references. Index. 1. Ontology. 2. Metaphysics. 3. Philosophy, German. 4. Thought and thinking. 5. Kant, Immanuel, 1724-1804. 6. Practice (Philosophy). 7. Philosophy and civilization. 8). Kant, Immanuel, 1724-1804 -- Wörterbuch. 9. Kant, Immanuel, 1724-1804 -- Concordances. 10. Kant, Immanuel, 1724-1804 -- 1889-1976 – Indexes. I. Ferrer, Daniel Fidel, 1952-. MOTTO As a famous motto calls us back to Kant, Otto Liebmann’s writes (Kant and His Epigones of 1865): “Also muss auf Kant zurückgegangen werden.” “Therefore, must return to Kant.” Table of Contents 1). Preface and Introduction. 2. Background on Kant’s Philosophy (hermeneutical historical situation). 3). Main Index (pages, 25 to 2676). Preface and Introduction Total words indexed: 58,928; for the 12 volumes that are in the MAIN INDEX are indexed: pages 1 to 7321. This monograph by Daniel Fidel Ferrer is 2676 pages in total. The following is a machine index of 12 volumes written by Immanuel Kant and translated from German into English. Everything is indexed including the text, title pages, preface, notes, editorials, glossary, indexes, biographical notes, and even some typos. No stop words or words removed from this index. There are some German words in the text, bibliographies, and in the glossaries (also included in Main Index). Titles in English of Kant’s writings for this index (pages 1 to 7321). Anthropology, History, and Education [Starts on page 1 Correspondence [Starts on page 313 Critique of Pure Reason [Starts page 971 Critique of the Power of Judgment [Starts on page 1771 Lectures on Logic [Starts on page 2247 Lectures on Metaphysics [Starts on page 2991 Notes and Fragments [Starts on page 3670 Opus Postumum [Starts on page 4374 Practical Philosophy [Starts on page 4741 Religion and Rational Theology [Starts on page 5446 Theoretical Philosophy after 1781 [Starts on page 5990 Theoretical Philosophy, 1755-1770 [Starts on page 6541 Universal Natural History and Theory of the Heavens or An Essay on the Constitution and the Mechanical Origin of the Entire Structure of the Universe Based on Newtonian Principles [Starts on page 7162 The whole single file which includes all of these books ends on page 7321. 12 volumes are pages 1 to 7321. These actual texts of these books by Kant are not include here because of copyright. This is only an index of these 7321 pages by Immanuel Kant. There are some German words in the text and in the glossaries, etc. Searching this Main Index. Please note the German words that start with umlauts are at the end of the index because of machine sorting of the words. Starting with the German word “ße” on page 2674 page of this book (see in Main Index). Use the FIND FUNCTION for all examples of the words or names you are searching. Examples from the Main Index mendacium, 5171, 5329, 5389 mendation, 220 mendax, 2702, 2800 mended, 360 Mendel, 416, 925, 965 Mendelian, 2212 Mendels, 345, 363, 417, 458, 560, 572, 588, 926, 928, 929 MENDELSSOHN, 925 Mendelssohn, 8, 9, 19, 98, 99, 100, 101

PhilPapers

FigShare