9 research outputs found
Generic Reinforcement Learning Beyond Small MDPs
Feature reinforcement learning (FRL) is a framework within which
an agent can automatically
reduce a complex environment to a Markov Decision Process (MDP)
by finding a map which
aggregates similar histories into the states of an MDP. The
primary motivation behind this
thesis is to build FRL agents that work in practice, both for
larger environments and larger
classes of environments. We focus on empirical work targeted at
practitioners in the field of
general reinforcement learning, with theoretical results wherever
necessary.
The current state-of-the-art in FRL uses suffix trees which have
issues with large observation
spaces and long-term dependencies. We start by addressing the
issue of long-term dependency
using a class of maps known as looping suffix trees, which have
previously been used to
represent deterministic POMDPs. We show the best existing results
on the TMaze domain
and good results on larger domains that require long-term
memory.
We introduce a new value-based cost function that can be
evaluated model-free. The value-
based cost allows for smaller representations, and its model-free
nature allows for its extension
to the function approximation setting, which has computational
and representational advantages for large state spaces. We
evaluate the performance of this new cost in both the tabular and
function approximation settings on a variety of domains, and show
performance better than the state-of-the-art algorithm
MC-AIXI-CTW on the domain POCMAN.
When the environment is very large, an FRL agent needs to explore
systematically in order to
find a good representation. However, it needs a good
representation in order to perform this
systematic exploration. We decouple both by considering a
different setting, one where the
agent has access to the value of any state-action pair from an
oracle in a training phase. The
agent must learn an approximate representation of the optimal
value function. We formulate
a regression-based solution based on online learning methods to
build an such an agent. We
test this agent on the Arcade Learning Environment using a simple
class of linear function
approximators.
While we made progress on the issue of scalability, two major
issues with the FRL framework
remain: the need for a stochastic search method to minimise the
objective function and the
need to store an uncompressed history, both of which can be very
computationally demanding
Aprendizaje por refuerzo para la toma de decisiones seguras en dominios con espacios de estados y acciones continuos
Los problemas de decisión constituyen uno de los campos m as fértiles para la aplicación de t ecnicas de Inteligencia Artificial (IA). Entre todas ellas, el Aprendizaje por Refuerzo ha surgido como un marco útil para el aprendizaje de políticas de comportamiento para la toma de decisiones a partir de la experiencia generada en entornos dinámicos y complejos. En Aprendizaje por Refuerzo, el agente interacciona con el entorno y una función de refuerzo se encarga de indicarle si está haciendo bien o mal la tarea que está aprendiendo. Gran parte del Aprendizaje por Refuerzo se fundamenta en las funciones de valor que proporcionan información acerca de la utilidad de encontrarse en un estado durante un proceso de toma de decisiones, o acerca de la utilidad de tomar una acción en un estado. Cuando se afrontan problemas donde los espacios de estados y acciones es muy grande o incluso continuo, la tradicional representación tabular de la función de valor no es práctica debido al alto coste que exigirá su almacenamiento y su cálculo. En estos casos, es necesaria la aplicación de técnicas de generalización que permitan obtener representaciones más compactas tanto del espacio de estados como del de acciones, de forma que se puedan aplicar eficientemente las técnicas de Aprendizaje por Refuerzo. Además de los espacios de estados y acciones continuos, otro problema importante al que debe hacer frente el Aprendizaje por Refuerzo es minimizar el n umero de daños (por colisiones, caídas) que se pueden ocasionar en el agente o en el sistema durante el proceso de aprendizaje (e.g., en una tarea donde se trata de aprender a volar un helicóptero, éste puede acabar chocando; cuando se trata de enseñar a andar a un robot, éste puede caerse). En esta Tesis se plantean dos grandes objetivos. El primero es c omo afrontar problemas donde los espacios de estados y acciones son de naturaleza continua (por tanto infinito) y de grandes dimensiones. Una de las opciones se centra en las técnicas de generalización basadas en la discretización. En esta Tesis se desarrollan algoritmos que combinan con éxito el uso de aproximación de funciones y técnicas de discretización, tratando de aprovechar las ventajas que ofrecen ambas técnicas. El segundo objetivo que se plantea para esta Tesis es minimizar el n umero de daños que sufre el agente o el sistema durante el proceso de aprendizaje en problemas totalmente continuos y de grandes dimensiones. En esta Tesis se da una nueva definición del concepto de riesgo, que permite identificar estados donde el agente es más propenso a sufrir algún tipo de daño. La consecución de los objetivos planteados implicará además investigar sobre la utilización de comportamientos base o expertos subóptimos que permitirán aportar conocimiento sobre la tarea que se trata de aprender, necesario cuando se abordan problemas complejos de grandes dimensiones y donde, además, el agente puede sufrir daños
Policy Search Based Relational Reinforcement Learning using the Cross-Entropy Method
Relational Reinforcement Learning (RRL) is a subfield of machine learning in which a learning agent seeks to maximise a numerical reward within an environment, represented as collections of objects and relations, by performing actions that interact with the environment. The relational representation allows more dynamic environment states than an attribute-based representation of reinforcement learning, but this flexibility also creates new problems such as a potentially infinite number of states.
This thesis describes an RRL algorithm named Cerrla that creates policies directly from a set of learned relational “condition-action” rules using the Cross-Entropy Method (CEM) to control policy creation. The CEM assigns each rule a sampling probability and gradually modifies these probabilities such that the randomly sampled policies consist of ‘better’ rules, resulting in larger rewards received. Rule creation is guided by an inferred partial model of the environment that defines: the minimal conditions needed to take an action, the possible specialisation conditions per rule, and a set of simplification rules to remove redundant and illegal rule conditions, resulting in compact, efficient, and comprehensible policies.
Cerrla is evaluated on four separate environments, where each environment has several different goals. Results show that compared to existing RRL algorithms, Cerrla is able to learn equal or better behaviour in less time on the standard RRL environment. On other larger, more complex environments, it can learn behaviour that is competitive to specialised approaches. The simplified rules and CEM’s bias towards compact policies result in comprehensive and effective relational policies created in a relatively short amount of time
Lernbeiträge im Rahmen einer kognitiven Architektur für die intelligente Prozessführung
In dieser Arbeit werden wichtige Aspekte einer kognitiven Architektur für das Erlernen von Regelungsaufgaben beleuchtet. Dabei geht es primär um die Merkmalsextraktion, das Reinforcement Learning und das Lernmanagement im Rahmen des Wahrnehmungs-Handlungs-Zyklus.
Wichtige Beiträge sind dabei verschiedene residuumsbasierte Ansätze zur hybriden Merkmalsselektion, ein Algorithmus zur Behandlung des Explorations-Exploitation-Dilemmas in kontinuierlichen Aktionsräumen, Untersuchungen zum Rewarddekompositionsproblem, sowie die Verzahnung der einzelnen Komponenten einer funktionierenden Architektur.
Der experimentelle Nachweis, dass das vorgestellte System die Lösung für reale Probleme erlernen kann, wird am herausfordernden Szenario der intelligenten Feuerungsführung erbracht. Dabei wird das Gesamtsystem zur Regelung eines mit Steinkohle gefeuerten Kraftwerks eingesetzt. Dabei wurden Ergebnisse erzielt, die bisher existierende Systeme und auch menschliche Experten deutlich übertreffen.In this thesis, important aspects of a cognitive architecture for learning
control tasks are discussed. Highlighted are the topics of feature
extraction, reinforcement learning and learning management in the context
of the perception-action-cycle. The contributions in the field of feature
extraction utilize informationtheoretic measures such as mutual information
to formulate new hybrid feature extraction algorithms. Finding features
that are explicitly linked with the errors made by a learning system are
the focus. It is shown this approach based on residuals is superior to
classical methods. Another topic of interest is the estimation of mutual
information in the context of feature extraction. State of the art
reinforcement learning methods are investigated for their suitability for
challenging applications. This work addresses issues of learning
management, such as the exploration-exploitation dilemma, the
plasticity-stability dilemma and the reward decomposition problem. New
contributions are made in the form of the diffusion tree-based
reinforcement learning algorithm and the SMILE approach. Likewise, an
architectural extension is proposed to organize the learning process. It
uses a process map as the core piece to achieve this organization.
Experimental evidence that the proposed system can learn the solution to
real problems is presented in the challenging scenario of intelligent
combustion control. The system is used to learn a control strategy in a
coal-fired power plant. The achieved results surpass existing systems and
human experts.In dieser Arbeit werden wichtige Aspekte einer kognitiven Architektur
für das Erlernen von Regelungsaufgaben beleuchtet. Dabei geht es primär
um die Merkmalsextraktion, das Reinforcement Learning und das
Lernmanagement im Rahmen des Wahrnehmungs-Handlungs-Zyklus. Für die
Merkmalsextraktion werden dabei mit Hilfe informationstheoretischer
Größen, wie der Transinformation, neue hybride
Merkmalsextraktionsverfahren vorgestellt. Neuartig ist dabei der Ansatz,
Merkmale zu suchen, die explizit mit den gemachten Fehlern eines lernenden
Systems verknüpft sind. Es wird gezeigt, dass diese residuumsbasierten
Ansätze klassischen Methoden überlegen sind. Es wird ebenfalls
untersucht, welche Schätzverfahren für die Bestimmung der
Transinformation im Sinne der Merkmalsextraktion geeignet sind. Als
Entscheidungsinstanz der Gesamtarchitektur werden aktuelle Reinforcement
Learning Verfahren auf ihre Eignung für komplexe Anwendungen hin
untersucht. Dabei wird auch auf Probleme des Lernmanagements, wie das
Explorations-Exploitations-Dilemma, das Stabilitäts-Plastizitäts-Dilemma
und das Rewarddekompositionsproblem eingegangen. Neue Beiträge werden
dabei in Form des Diffusionsbaumbasiertes Reinforcement Learning und des
SMILE-Algorithmus geliefert. Ebenso wird eine Architekturerweiterung zum
Organisieren der Lernprozesse vorgeschlagen, welche im Kern um eine
Prozesskarte angeordnet ist. Der experimentelle Nachweis, dass das
vorgestellte System die Lösung für reale Probleme erlernen kann, wird am
herausfordernden Szenarioder intelligenten Feuerungsführung erbracht.
Dabei wird das Gesamtsystem zur Regelung eines mit Steinkohle gefeuerten
Kraftwerks eingesetzt, wobei Ergebnisse erzielt werden, die bisher
existierende Systeme und auch menschliche Experten übertreffen
Walking to Mourning Doves: A Memoir
Preface: Waking to Mourning DovesWaking to the soulful call of a mourning dove shortly after my mother’s death gave me the inspiration for my book title. Their plaintive cooing evokes nostalgic reminders of my life growing up on rural South Dakota prairies in the 1940s and 1950s, a memory of the simpler times of my youth. Soft spring mornings, gentle breezes stirring the curtains and a cacophony of birdsong and farm animal sounds define contentment to me. Their cooing still evokes the same feeling that all is right with the world. Mourning doves are unassuming, not flashy and don’t push and shove or bully other birds out of their space. They have adapted to blend into their surroundings, just as the women who settled the Great Plains did. Their nests and homes aren’t fancy, but they have survived and thrived, just like my immigrant ancestral families. Furthermore, they symbolize much of the happenings of my life and the experiences that have molded it, for they are like the soft-spoken ladies that surrounded me in my childhood: women who helped form my character and outlook on the world. These ladies were strong women and full partners in life, but with the polite, gentle demeanor of mourning doves. My book is built around excerpts from my personal diary entries from ages ten to twenty-one, adult journaling and my memories. Dated diary entries and family journals are printed verbatim and expanded upon. It is obvious from this book that I have an interest in preserving family history. My husband and I have published seven family-history books that describe our ancestors as far back as we can find records. Many go back hundreds of years and dozens of generations to our northern European roots. In addition to the history of our ancestors, I wish to pass on to my descendants my own personal history. It is my hope that this memoir will inspire readers to record their own life experiences