13 research outputs found
my Human Brain Project (mHBP)
How can we make an agent that thinks like us humans? An agent that can have
proprioception, intrinsic motivation, identify deception, use small amounts of energy, transfer
knowledge between tasks and evolve? This is the problem that this thesis is focusing on.
Being able to create a piece of software that can perform tasks like a human being, is
a goal that, if achieved, will allow us to extend our own capabilities to a very high level, and
have more tasks performed in a predictable fashion. This is one of the motivations for this
thesis.
To address this problem, we have proposed a modular architecture for
Reinforcement Learning computation and developed an implementation to have this
architecture exercised. This software, that we call mHBP, is created in Python using Webots
as an environment for the agent, and Neo4J, a graph database, as memory. mHBP takes
the sensory data or other inputs, and produces, based on the body parts / tools that the
agent has available, an output consisting of actions to perform.
This thesis involves experimental design with several iterations, exploring a
theoretical approach to RL based on graph databases. We conclude, with our work in this
thesis, that it is possible to represent episodic data in a graph, and is also possible to
interconnect Webots, Python and Neo4J to support a stable architecture for Reinforcement
Learning. In this work we also find a way to search for policies using the Neo4J querying
language: Cypher. Another key conclusion of this work is that state representation needs to
have further research to find a state definition that enables policy search to produce more
useful policies.
The article “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” at
Research Gate with doi 10.13140/RG.2.2.30323.76327 is an outcome of this thesis.Como podemos criar um agente que pense como nós humanos? Um agente que tenha
propriocepção, motivação intrínseca, seja capaz de identificar ilusão, usar pequenas
quantidades de energia, transferir conhecimento entre tarefas e evoluir? Este é o problema
em que se foca esta tese.
Ser capaz de criar uma peça de software que desempenhe tarefas como um ser
humano é um objectivo que, se conseguido, nos permitirá estender as nossas capacidades
a um nível muito alto, e conseguir realizar mais tarefas de uma forma previsível. Esta é uma
das motivações desta tese.
Para endereçar este problema, propomos uma arquitectura modular para
computação de aprendizagem por reforço e desenvolvemos uma implementação para
exercitar esta arquitetura. Este software, ao qual chamamos mHBP, foi criado em Python
usando o Webots como um ambiente para o agente, e o Neo4J, uma base de dados de
grafos, como memória. O mHBP recebe dados sensoriais ou outros inputs, e produz,
baseado nas partes do corpo / ferramentas que o agente tem disponíveis, um output que
consiste em ações a desempenhar.
Uma boa parte desta tese envolve desenho experimental com diversas iterações,
explorando uma abordagem teórica assente em bases de dados de grafos. Concluímos,
com o trabalho nesta tese, que é possível representar episódios em um grafo, e que é,
também, possível interligar o Webots, com o Python e o Neo4J para suportar uma
arquitetura estável para a aprendizagem por reforço. Neste trabalho, também, encontramos
uma forma de procurar políticas usando a linguagem de pesquisa do Neo4J: Cypher. Outra
conclusão chave deste trabalho é que a representação de estados necessita de mais
investigação para encontrar uma definição de estado que permita à pesquisa de políticas
produzir políticas que sejam mais úteis.
O artigo “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” no
Research Gate com o doi 10.13140/RG.2.2.30323.76327 é um sub-produto desta tese
Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning
Many problems in sequential decision making and stochastic control often have
natural multiscale structure: sub-tasks are assembled together to accomplish
complex goals. Systematically inferring and leveraging hierarchical structure,
particularly beyond a single level of abstraction, has remained a longstanding
challenge. We describe a fast multiscale procedure for repeatedly compressing,
or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of
sub-problems at different scales is automatically determined. Coarsened MDPs
are themselves independent, deterministic MDPs, and may be solved using
existing algorithms. The multiscale representation delivered by this procedure
decouples sub-tasks from each other and can lead to substantial improvements in
convergence rates both locally within sub-problems and globally across
sub-problems, yielding significant computational savings. A second fundamental
aspect of this work is that these multiscale decompositions yield new transfer
opportunities across different problems, where solutions of sub-tasks at
different levels of the hierarchy may be amenable to transfer to new problems.
Localized transfer of policies and potential operators at arbitrary scales is
emphasized. Finally, we demonstrate compression and transfer in a collection of
illustrative domains, including examples involving discrete and continuous
statespaces.Comment: 86 pages, 15 figure
Hierarchies of reward machines
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine whose edges encode subgoals of the task using high-level events. The structure of RMs enables the decomposition of a task into simpler and independently solvable subtasks that help tackle longhorizon and/or sparse reward tasks. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs, thus composing a hierarchy of RMs (HRM). We exploit HRMs by treating each call to an RM as an independently solvable subtask using the options framework, and describe a curriculum-based method to learn HRMs from traces observed by the agent. Our experiments reveal that exploiting a handcrafted HRM leads to faster convergence than with a flat HRM, and that learning an HRM is feasible in cases where its equivalent flat representation is not
Temporal abstraction and generalisation in reinforcement learning
The ability of agents to generalise---to perform well when presented with previously unseen situations and data---is deeply important to the reliability, autonomy, and functionality of artificial intelligence systems. The generalisation test examines an agent's ability to reason over the world in an \emph{abstract} manner. In reinforcement learning problem settings, where an agent interacts continually with the environment, multiple notions of abstraction are possible. State-based abstraction allows for generalised behaviour across different \mccorrect{observations in the environment} that share similar properties. On the other hand, temporal abstraction is concerned with generalisation over an agent's own behaviour. This form of abstraction allows an agent to reason in a unified manner over different sequences of actions that may lead to similar outcomes. Data abstraction refers to the fact that agents may need to make use of information gleaned using data from one sampling distribution, while being evaluated on a different sampling distribution.
This thesis develops algorithmic, theoretical, and empirical results on the questions of state abstraction, temporal abstraction, and finite-data generalisation performance for reinforcement learning algorithms. To focus on data abstraction, we explore an imitation learning setting. We provide a novel algorithm for completely offline imitation learning, as well as an empirical evaluation pipeline for offline reinforcement learning algorithms, encouraging honest and principled data complexity results and discouraging overfitting of algorithm hyperparameters to the environment on which test scores are reported. In order to more deeply explore state abstraction, we provide finite-sample analysis of target network performance---a key architectural element of deep reinforcement learning. By conducting our analysis in the fully nonlinear setting, we are able to help explain the strong performance of nonlinear neural-network based function approximation. Finally, we consider the question of temporal abstraction, providing an algorithm for semi-supervised (partially reward-free) learning of skills. This algorithm improves on the variational option discovery framework---solving a key under-specification problem in the domain---by defining skills which are specified in terms of a learned, reward-dependent state abstraction
Recommended from our members
Multi-Reward Learning and Sparse Rewards
Reinforcement learning has made impressive strides in solving problems in challenging domains, but problems are increasingly being described with sparse rewards. Sparse rewards directly reduce the rate at which useful feedback is provided to the learner and make it difficult to distinguish between what specific actions led to the reception of a reward. This greatly reduces the speed of learning or completely thwarts attempts at learning completely. Some combat the difficulty of learning under sparsity by using multi-reward schemes. These schemes utilize more rewards than just the true system evaluation by doing things like providing exploration incentives or abstracting away a hierarchy of policies, each with different rewards. There are also further techniques that do not rely on multiple rewards, such as reward shaping or transfer learning. A key insight is that these techniques mentioned are orthogonal: multi-reward schemes can receive further benefits by applying other techniques. This project explores various multi-reward strategies and alternative solutions to sparse rewards to find intelligent ways to combine these methods. We provide three specific examples combining intrinsic rewards and transfer learning, imitation learning and policy combination, and hierarchical reinforcement learning and reward shaping in ways that extend the current state-of-the-art. To demonstrate practical usage of these techniques, we describe the application of these techniques to a sparsely rewarded underwater manipulation problem
Recommended from our members
Hierarchical structure discovery and transfer in sequential decision problems
Acting intelligently to efficiently solve sequential decision problems requires the ability to extract hierarchical structure from the underlying domain dynamics, exploit it for optimal or near-optimal decision-making, and transfer it to related problems instead of solving every problem in isolation. This dissertation makes three contributions toward this goal.
The first contribution is the introduction of two frameworks for the transfer of hierarchical structure in sequential decision problems. The MASH framework facilitates transfer among multiple agents coordinating within a domain. The VRHRL framework allows an agent to transfer its knowledge across a family of domains that share the same transition dynamics but have differing reward dynamics. Both MASH and VRHRL are validated empirically in large domains and the results demonstrate significant speedup in the solutions due to transfer.
The second contribution is a new approach to the discovery of hierarchical structure in sequential decision problems. HI-MAT leverages action models to analyze the relevant dependencies in a hierarchically-generated trajectory and it discovers hierarchical structure that transfers to all problems whose actions share the same relevant dependencies as the single source problem. HierGen advances HI-MAT by learning simple action models, leveraging these models to analyze non-hierarchically-generated trajectories from multiple source problems in a robust causal fashion, and discovering hierarchical structure that transfers to all problems whose actions share the same causal dependencies as those in the source problems. Empirical evaluations in multiple domains demonstrate that the discovered hierarchical structures are comparable to manually-designed structures in quality and performance.
Action models are essential to hierarchical structure discovery and other aspects of intelligent behavior. The third contribution of this dissertation is the introduction of two general frameworks for learning action models in sequential decision problems. In the MBP framework, learning is user-driven; in the PLEX framework, the learner generates its own problems. The frameworks are formally analyzed and reduced to concept learning with one-sided error. A general action-modeling language is shown to be efficiently learnable in both frameworks
Effective reinforcement learning for collaborative multi-agent domains
Ph.DDOCTOR OF PHILOSOPH
A Reinforcement Learning Technique For Enhancing Human Behavior Models In A Context-based Architecture
A reinforcement-learning technique for enhancing human behavior models in a context-based learning architecture is presented. Prior to the introduction of this technique, human models built and developed in a Context-Based reasoning framework lacked learning capabilities. As such, their performance and quality of behavior was always limited by what the subject matter expert whose knowledge is modeled was able to articulate or demonstrate. Results from experiments performed show that subject matter experts are prone to making errors and at times they lack information on situations that are inherently necessary for the human models to behave appropriately and optimally in those situations. The benefits of the technique presented is two fold; 1) It shows how human models built in a context-based framework can be modified to correctly reflect the knowledge learnt in a simulator; and 2) It presents a way for subject matter experts to verify and validate the knowledge they share. The results obtained from this research show that behavior models built in a context-based framework can be enhanced by learning and reflecting the constraints in the environment. From the results obtained, it was shown that after the models are enhanced, the agents performed better based on the metrics evaluated. Furthermore, after learning, the agent was shown to recognize unknown situations and behave appropriately in previously unknown situations. The overall performance and quality of behavior of the agent improved significantly
Saturated fatty acids, linseed components and high amylose wheat in attenuation of diet-induced metabolic syndrome
Metabolic syndrome is a syndrome characterised by central obesity, dyslipidaemia, hypertension, fatty liver disease and insulin resistance that ultimately raises the risk of heart disease, diabetes, stroke, cancers and osteoarthritis. In combating metabolic syndrome, lifestyle changes are considered the most important initial steps which include a healthy, well-balanced diet and increased physical activity. Enrichment of beneficial fatty acids and incorporation of functional foods and bioactive nutrients are part of healthy dietary regimes in treating metabolic syndrome. These strategies provide options other than drug therapies that may cause adverse effects. Nevertheless, the effectiveness of these foods or bioactive nutrients in treating metabolic syndrome has yet to be fully explored. Therefore, in this thesis, I examined the physiological effects of individual saturated fatty acids (lauric, myristic, palmitic and stearic acid), linseed components (lignans, raw linseed and defatted linseed) and
high amylose wheat (5% and 20%) using a validated diet-induced rat model of cardiovascular, liver and metabolic changes mimicking most of the changes in the human metabolic syndrome.
Male Wistar rats fed with either diet containing 20% of lauric, myristic, palmitic, or stearic acid or corn-starch or high-carbohydrate, high-fat diet for 16 weeks
showed that longer-chain saturated fatty acids (myristic, palmitic and stearic) and the mixture of stearic and trans fats in beef tallow produced obesity, in contrast to rats
treated with lauric acid that exhibited low total fat mass, abdominal circumference and visceral adiposity index. Lauric acid supplemented rats also showed a normal
cardiovascular and hepatic structure compared to other saturated fatty acids. This study suggests that replacing beef tallow with stearic and palmitic acids would show small
improvements but replacement with lauric and possibly myristic acids in human diets would markedly attenuate the development of metabolic syndrome.
Linseed is a rich source of plant lignans such as secoisolariciresinol diglucoside, as well as dietary fibre. Supplementation of lignan (0.03%) and defatted
linseed (3%) in a high-carbohydrate, high-fat diet for eight weeks lowered body weight gain, total fat mass, improved cardiovascular functions, reduced hepatic steatosis and altered metabolic profiles, which can be regarded as beneficial to health, whereas raw
linseed (5%) exacerbated adiposity with no changes in other metabolic biomarkers except for reduced systolic blood pressure. This study suggests that lignan and dietary
fibre in defatted linseed could reduce the symptoms of metabolic syndrome. In contrast, positive physiological effects of raw linseed diminish possibly due to the
properties of raw linseed that may pass through the intestine undigested, which means the nutritional benefits are unable to be realised. Another functional food described in this thesis is high amylose wheat flour.
In this study, two dosages (5% and 20%) of high amylose wheat flour were supplemented in high-carbohydrate, high-fat diet for 8 weeks. Rats fed with 5% high amylose wheat flour showed no changes in the metabolic parameters. However, highcarbohydrate, high-fat diet-fed rats fed 20% high amylose wheat flour showed reduced body fat mass and increased lean mass despite no change in the body weight. The addition of 20% high amylose wheat in the diet was also associated with better glycaemic control, decreased insulin and leptin concentrations with cardioprotective
and hepatoprotective effects. These effects are probably due to the increased resistant starch content in high amylose wheat, thus ameliorating the risk of developing
metabolic syndrome.
The studies in this thesis provided evidence that not all saturated fatty acids are equal, with lauric acid producing fewer pathophysiological changes in most parameters than other saturated fatty acids in this model of diet-induced metabolic syndrome. Studies from linseed components and high amylose wheat clearly indicate that these foods or food components have the potential to reverse most of the risk factors associated with metabolic syndrome. The most likely mechanisms of these food or food components are through the cardioprotective and hepatoprotective effects
produced by anti-inflammatory responses