65 research outputs found
Learning object relationships which determine the outcome of actions
Peer reviewedPublisher PD
my Human Brain Project (mHBP)
How can we make an agent that thinks like us humans? An agent that can have
proprioception, intrinsic motivation, identify deception, use small amounts of energy, transfer
knowledge between tasks and evolve? This is the problem that this thesis is focusing on.
Being able to create a piece of software that can perform tasks like a human being, is
a goal that, if achieved, will allow us to extend our own capabilities to a very high level, and
have more tasks performed in a predictable fashion. This is one of the motivations for this
thesis.
To address this problem, we have proposed a modular architecture for
Reinforcement Learning computation and developed an implementation to have this
architecture exercised. This software, that we call mHBP, is created in Python using Webots
as an environment for the agent, and Neo4J, a graph database, as memory. mHBP takes
the sensory data or other inputs, and produces, based on the body parts / tools that the
agent has available, an output consisting of actions to perform.
This thesis involves experimental design with several iterations, exploring a
theoretical approach to RL based on graph databases. We conclude, with our work in this
thesis, that it is possible to represent episodic data in a graph, and is also possible to
interconnect Webots, Python and Neo4J to support a stable architecture for Reinforcement
Learning. In this work we also find a way to search for policies using the Neo4J querying
language: Cypher. Another key conclusion of this work is that state representation needs to
have further research to find a state definition that enables policy search to produce more
useful policies.
The article “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” at
Research Gate with doi 10.13140/RG.2.2.30323.76327 is an outcome of this thesis.Como podemos criar um agente que pense como nós humanos? Um agente que tenha
propriocepção, motivação intrínseca, seja capaz de identificar ilusão, usar pequenas
quantidades de energia, transferir conhecimento entre tarefas e evoluir? Este é o problema
em que se foca esta tese.
Ser capaz de criar uma peça de software que desempenhe tarefas como um ser
humano é um objectivo que, se conseguido, nos permitirá estender as nossas capacidades
a um nível muito alto, e conseguir realizar mais tarefas de uma forma previsível. Esta é uma
das motivações desta tese.
Para endereçar este problema, propomos uma arquitectura modular para
computação de aprendizagem por reforço e desenvolvemos uma implementação para
exercitar esta arquitetura. Este software, ao qual chamamos mHBP, foi criado em Python
usando o Webots como um ambiente para o agente, e o Neo4J, uma base de dados de
grafos, como memória. O mHBP recebe dados sensoriais ou outros inputs, e produz,
baseado nas partes do corpo / ferramentas que o agente tem disponíveis, um output que
consiste em ações a desempenhar.
Uma boa parte desta tese envolve desenho experimental com diversas iterações,
explorando uma abordagem teórica assente em bases de dados de grafos. Concluímos,
com o trabalho nesta tese, que é possível representar episódios em um grafo, e que é,
também, possível interligar o Webots, com o Python e o Neo4J para suportar uma
arquitetura estável para a aprendizagem por reforço. Neste trabalho, também, encontramos
uma forma de procurar políticas usando a linguagem de pesquisa do Neo4J: Cypher. Outra
conclusão chave deste trabalho é que a representação de estados necessita de mais
investigação para encontrar uma definição de estado que permita à pesquisa de políticas
produzir políticas que sejam mais úteis.
O artigo “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” no
Research Gate com o doi 10.13140/RG.2.2.30323.76327 é um sub-produto desta tese
Intrinsic Rewards for Maintenance, Approach, Avoidance and Achievement Goal Types
In reinforcement learning, reward is used to guide the learning process. The reward is often designed to be task-dependent, and it may require significant domain knowledge to design a good reward function. This paper proposes general reward functions for maintenance, approach, avoidance, and achievement goal types. These reward functions exploit the inherent property of each type of goal and are thus task-independent. We also propose metrics to measure an agent's performance for learning each type of goal. We evaluate the intrinsic reward functions in a framework that can autonomously generate goals and learn solutions to those goals using a standard reinforcement learning algorithm. We show empirically how the proposed reward functions lead to learning in a mobile robot application. Finally, using the proposed reward functions as building blocks, we demonstrate how compound reward functions, reward functions to generate sequences of tasks, can be created that allow the mobile robot to learn more complex behaviors
Combining reinforcement learning and optimal control for the control of nonlinear dynamical systems
This thesis presents a novel hierarchical learning framework, Reinforcement Learning Optimal Control,
for controlling nonlinear dynamical systems with continuous states and actions. The adapted approach
mimics the neural computations that allow our brain to bridge across the divide between symbolic
action-selection and low-level actuation control by operating at two levels of abstraction. First, current
findings demonstrate that at the level of limb coordination human behaviour is explained by linear
optimal feedback control theory, where cost functions match energy and timing constraints of tasks.
Second, humans learn cognitive tasks involving learning symbolic level action selection, in terms of
both model-free and model-based reinforcement learning algorithms. We postulate that the ease with
which humans learn complex nonlinear tasks arises from combining these two levels of abstraction.
The Reinforcement Learning Optimal Control framework learns the local task dynamics from naive
experience using an expectation maximization algorithm for estimation of linear dynamical systems
and forms locally optimal Linear Quadratic Regulators, producing continuous low-level control. A
high-level reinforcement learning agent uses these available controllers as actions and learns how to
combine them in state space, while maximizing a long term reward. The optimal control costs form
training signals for high-level symbolic learner. The algorithm demonstrates that a small number of
locally optimal linear controllers can be combined in a smart way to solve global nonlinear control
problems and forms a proof-of-principle to how the brain may bridge the divide between low-level
continuous control and high-level symbolic action selection. It competes in terms of computational
cost and solution quality with state-of-the-art control, which is illustrated with solutions to benchmark
problems.Open Acces
Slowness learning for curiosity-driven agents
In the absence of external guidance, how can a robot learn to map the many raw pixels of high-dimensional visual inputs to useful action sequences? I study methods that achieve this by making robots self-motivated (curious) to continually build compact representations of sensory inputs that encode different aspects of the changing environment. Previous curiosity-based agents acquired skills by associating intrinsic rewards with world model improvements, and used reinforcement learning (RL) to learn how to get these intrinsic rewards. But unlike in previous implementations, I consider streams of high-dimensional visual inputs, where the world model is a set of compact low-dimensional representations of the high-dimensional inputs. To learn these representations, I use the slowness learning principle, which states that the underlying causes of the changing sensory inputs vary on a much slower time scale than the observed sensory inputs. The representations learned through the slowness learning principle are called slow features (SFs). Slow features have been shown to be useful for RL, since they capture the underlying transition process by extracting spatio-temporal regularities in the raw sensory inputs. However, existing techniques that learn slow features are not readily applicable to curiosity-driven online learning agents, as they estimate computationally expensive covariance matrices from the data via batch processing. The first contribution called the incremental SFA (IncSFA), is a low-complexity, online algorithm that extracts slow features without storing any input data or estimating costly covariance matrices, thereby making it suitable to be used for several online learning applications. However, IncSFA gradually forgets previously learned representations whenever the statistics of the input change. In open-ended online learning, it becomes essential to store learned representations to avoid re- learning previously learned inputs. The second contribution is an online active modular IncSFA algorithm called the curiosity-driven modular incremental slow feature analysis (Curious Dr. MISFA). Curious Dr. MISFA addresses the forgetting problem faced by IncSFA and learns expert slow feature abstractions in order from least to most costly, with theoretical guarantees. The third contribution uses the Curious Dr. MISFA algorithm in a continual curiosity-driven skill acquisition framework that enables robots to acquire, store, and re-use both abstractions and skills in an online and continual manner. I provide (a) a formal analysis of the working of the proposed algorithms; (b) compare them to the existing methods; and (c) use the iCub humanoid robot to demonstrate their application in real-world environments. These contributions together demonstrate that the online implementations of slowness learning make it suitable for an open-ended curiosity-driven RL agent to acquire a repertoire of skills that map the many raw pixels of high-dimensional images to multiple sets of action sequences
Spartan Daily, October 12, 1977
Volume 69, Issue 28https://scholarworks.sjsu.edu/spartandaily/6249/thumbnail.jp
- …