9 research outputs found

    Generic Reinforcement Learning Beyond Small MDPs

    No full text
    Feature reinforcement learning (FRL) is a framework within which an agent can automatically reduce a complex environment to a Markov Decision Process (MDP) by finding a map which aggregates similar histories into the states of an MDP. The primary motivation behind this thesis is to build FRL agents that work in practice, both for larger environments and larger classes of environments. We focus on empirical work targeted at practitioners in the field of general reinforcement learning, with theoretical results wherever necessary. The current state-of-the-art in FRL uses suffix trees which have issues with large observation spaces and long-term dependencies. We start by addressing the issue of long-term dependency using a class of maps known as looping suffix trees, which have previously been used to represent deterministic POMDPs. We show the best existing results on the TMaze domain and good results on larger domains that require long-term memory. We introduce a new value-based cost function that can be evaluated model-free. The value- based cost allows for smaller representations, and its model-free nature allows for its extension to the function approximation setting, which has computational and representational advantages for large state spaces. We evaluate the performance of this new cost in both the tabular and function approximation settings on a variety of domains, and show performance better than the state-of-the-art algorithm MC-AIXI-CTW on the domain POCMAN. When the environment is very large, an FRL agent needs to explore systematically in order to find a good representation. However, it needs a good representation in order to perform this systematic exploration. We decouple both by considering a different setting, one where the agent has access to the value of any state-action pair from an oracle in a training phase. The agent must learn an approximate representation of the optimal value function. We formulate a regression-based solution based on online learning methods to build an such an agent. We test this agent on the Arcade Learning Environment using a simple class of linear function approximators. While we made progress on the issue of scalability, two major issues with the FRL framework remain: the need for a stochastic search method to minimise the objective function and the need to store an uncompressed history, both of which can be very computationally demanding

    Aprendizaje por refuerzo para la toma de decisiones seguras en dominios con espacios de estados y acciones continuos

    Get PDF
    Los problemas de decisión constituyen uno de los campos m as fértiles para la aplicación de t ecnicas de Inteligencia Artificial (IA). Entre todas ellas, el Aprendizaje por Refuerzo ha surgido como un marco útil para el aprendizaje de políticas de comportamiento para la toma de decisiones a partir de la experiencia generada en entornos dinámicos y complejos. En Aprendizaje por Refuerzo, el agente interacciona con el entorno y una función de refuerzo se encarga de indicarle si está haciendo bien o mal la tarea que está aprendiendo. Gran parte del Aprendizaje por Refuerzo se fundamenta en las funciones de valor que proporcionan información acerca de la utilidad de encontrarse en un estado durante un proceso de toma de decisiones, o acerca de la utilidad de tomar una acción en un estado. Cuando se afrontan problemas donde los espacios de estados y acciones es muy grande o incluso continuo, la tradicional representación tabular de la función de valor no es práctica debido al alto coste que exigirá su almacenamiento y su cálculo. En estos casos, es necesaria la aplicación de técnicas de generalización que permitan obtener representaciones más compactas tanto del espacio de estados como del de acciones, de forma que se puedan aplicar eficientemente las técnicas de Aprendizaje por Refuerzo. Además de los espacios de estados y acciones continuos, otro problema importante al que debe hacer frente el Aprendizaje por Refuerzo es minimizar el n umero de daños (por colisiones, caídas) que se pueden ocasionar en el agente o en el sistema durante el proceso de aprendizaje (e.g., en una tarea donde se trata de aprender a volar un helicóptero, éste puede acabar chocando; cuando se trata de enseñar a andar a un robot, éste puede caerse). En esta Tesis se plantean dos grandes objetivos. El primero es c omo afrontar problemas donde los espacios de estados y acciones son de naturaleza continua (por tanto infinito) y de grandes dimensiones. Una de las opciones se centra en las técnicas de generalización basadas en la discretización. En esta Tesis se desarrollan algoritmos que combinan con éxito el uso de aproximación de funciones y técnicas de discretización, tratando de aprovechar las ventajas que ofrecen ambas técnicas. El segundo objetivo que se plantea para esta Tesis es minimizar el n umero de daños que sufre el agente o el sistema durante el proceso de aprendizaje en problemas totalmente continuos y de grandes dimensiones. En esta Tesis se da una nueva definición del concepto de riesgo, que permite identificar estados donde el agente es más propenso a sufrir algún tipo de daño. La consecución de los objetivos planteados implicará además investigar sobre la utilización de comportamientos base o expertos subóptimos que permitirán aportar conocimiento sobre la tarea que se trata de aprender, necesario cuando se abordan problemas complejos de grandes dimensiones y donde, además, el agente puede sufrir daños

    Policy Search Based Relational Reinforcement Learning using the Cross-Entropy Method

    Get PDF
    Relational Reinforcement Learning (RRL) is a subfield of machine learning in which a learning agent seeks to maximise a numerical reward within an environment, represented as collections of objects and relations, by performing actions that interact with the environment. The relational representation allows more dynamic environment states than an attribute-based representation of reinforcement learning, but this flexibility also creates new problems such as a potentially infinite number of states. This thesis describes an RRL algorithm named Cerrla that creates policies directly from a set of learned relational “condition-action” rules using the Cross-Entropy Method (CEM) to control policy creation. The CEM assigns each rule a sampling probability and gradually modifies these probabilities such that the randomly sampled policies consist of ‘better’ rules, resulting in larger rewards received. Rule creation is guided by an inferred partial model of the environment that defines: the minimal conditions needed to take an action, the possible specialisation conditions per rule, and a set of simplification rules to remove redundant and illegal rule conditions, resulting in compact, efficient, and comprehensible policies. Cerrla is evaluated on four separate environments, where each environment has several different goals. Results show that compared to existing RRL algorithms, Cerrla is able to learn equal or better behaviour in less time on the standard RRL environment. On other larger, more complex environments, it can learn behaviour that is competitive to specialised approaches. The simplified rules and CEM’s bias towards compact policies result in comprehensive and effective relational policies created in a relatively short amount of time

    Kernel-Based Online NEAT for Keepaway Soccer

    No full text

    Lernbeiträge im Rahmen einer kognitiven Architektur für die intelligente Prozessführung

    Get PDF
    In dieser Arbeit werden wichtige Aspekte einer kognitiven Architektur für das Erlernen von Regelungsaufgaben beleuchtet. Dabei geht es primär um die Merkmalsextraktion, das Reinforcement Learning und das Lernmanagement im Rahmen des Wahrnehmungs-Handlungs-Zyklus. Wichtige Beiträge sind dabei verschiedene residuumsbasierte Ansätze zur hybriden Merkmalsselektion, ein Algorithmus zur Behandlung des Explorations-Exploitation-Dilemmas in kontinuierlichen Aktionsräumen, Untersuchungen zum Rewarddekompositionsproblem, sowie die Verzahnung der einzelnen Komponenten einer funktionierenden Architektur. Der experimentelle Nachweis, dass das vorgestellte System die Lösung für reale Probleme erlernen kann, wird am herausfordernden Szenario der intelligenten Feuerungsführung erbracht. Dabei wird das Gesamtsystem zur Regelung eines mit Steinkohle gefeuerten Kraftwerks eingesetzt. Dabei wurden Ergebnisse erzielt, die bisher existierende Systeme und auch menschliche Experten deutlich übertreffen.In this thesis, important aspects of a cognitive architecture for learning control tasks are discussed. Highlighted are the topics of feature extraction, reinforcement learning and learning management in the context of the perception-action-cycle. The contributions in the field of feature extraction utilize informationtheoretic measures such as mutual information to formulate new hybrid feature extraction algorithms. Finding features that are explicitly linked with the errors made by a learning system are the focus. It is shown this approach based on residuals is superior to classical methods. Another topic of interest is the estimation of mutual information in the context of feature extraction. State of the art reinforcement learning methods are investigated for their suitability for challenging applications. This work addresses issues of learning management, such as the exploration-exploitation dilemma, the plasticity-stability dilemma and the reward decomposition problem. New contributions are made in the form of the diffusion tree-based reinforcement learning algorithm and the SMILE approach. Likewise, an architectural extension is proposed to organize the learning process. It uses a process map as the core piece to achieve this organization. Experimental evidence that the proposed system can learn the solution to real problems is presented in the challenging scenario of intelligent combustion control. The system is used to learn a control strategy in a coal-fired power plant. The achieved results surpass existing systems and human experts.In dieser Arbeit werden wichtige Aspekte einer kognitiven Architektur für das Erlernen von Regelungsaufgaben beleuchtet. Dabei geht es primär um die Merkmalsextraktion, das Reinforcement Learning und das Lernmanagement im Rahmen des Wahrnehmungs-Handlungs-Zyklus. Für die Merkmalsextraktion werden dabei mit Hilfe informationstheoretischer Größen, wie der Transinformation, neue hybride Merkmalsextraktionsverfahren vorgestellt. Neuartig ist dabei der Ansatz, Merkmale zu suchen, die explizit mit den gemachten Fehlern eines lernenden Systems verknüpft sind. Es wird gezeigt, dass diese residuumsbasierten Ansätze klassischen Methoden überlegen sind. Es wird ebenfalls untersucht, welche Schätzverfahren für die Bestimmung der Transinformation im Sinne der Merkmalsextraktion geeignet sind. Als Entscheidungsinstanz der Gesamtarchitektur werden aktuelle Reinforcement Learning Verfahren auf ihre Eignung für komplexe Anwendungen hin untersucht. Dabei wird auch auf Probleme des Lernmanagements, wie das Explorations-Exploitations-Dilemma, das Stabilitäts-Plastizitäts-Dilemma und das Rewarddekompositionsproblem eingegangen. Neue Beiträge werden dabei in Form des Diffusionsbaumbasiertes Reinforcement Learning und des SMILE-Algorithmus geliefert. Ebenso wird eine Architekturerweiterung zum Organisieren der Lernprozesse vorgeschlagen, welche im Kern um eine Prozesskarte angeordnet ist. Der experimentelle Nachweis, dass das vorgestellte System die Lösung für reale Probleme erlernen kann, wird am herausfordernden Szenarioder intelligenten Feuerungsführung erbracht. Dabei wird das Gesamtsystem zur Regelung eines mit Steinkohle gefeuerten Kraftwerks eingesetzt, wobei Ergebnisse erzielt werden, die bisher existierende Systeme und auch menschliche Experten übertreffen

    Walking to Mourning Doves: A Memoir

    Get PDF
    Preface: Waking to Mourning DovesWaking to the soulful call of a mourning dove shortly after my mother’s death gave me the inspiration for my book title. Their plaintive cooing evokes nostalgic reminders of my life growing up on rural South Dakota prairies in the 1940s and 1950s, a memory of the simpler times of my youth. Soft spring mornings, gentle breezes stirring the curtains and a cacophony of birdsong and farm animal sounds define contentment to me. Their cooing still evokes the same feeling that all is right with the world. Mourning doves are unassuming, not flashy and don’t push and shove or bully other birds out of their space. They have adapted to blend into their surroundings, just as the women who settled the Great Plains did. Their nests and homes aren’t fancy, but they have survived and thrived, just like my immigrant ancestral families. Furthermore, they symbolize much of the happenings of my life and the experiences that have molded it, for they are like the soft-spoken ladies that surrounded me in my childhood: women who helped form my character and outlook on the world. These ladies were strong women and full partners in life, but with the polite, gentle demeanor of mourning doves. My book is built around excerpts from my personal diary entries from ages ten to twenty-one, adult journaling and my memories. Dated diary entries and family journals are printed verbatim and expanded upon. It is obvious from this book that I have an interest in preserving family history. My husband and I have published seven family-history books that describe our ancestors as far back as we can find records. Many go back hundreds of years and dozens of generations to our northern European roots. In addition to the history of our ancestors, I wish to pass on to my descendants my own personal history. It is my hope that this memoir will inspire readers to record their own life experiences
    corecore