13 research outputs found

    my Human Brain Project (mHBP)

    Get PDF
    How can we make an agent that thinks like us humans? An agent that can have proprioception, intrinsic motivation, identify deception, use small amounts of energy, transfer knowledge between tasks and evolve? This is the problem that this thesis is focusing on. Being able to create a piece of software that can perform tasks like a human being, is a goal that, if achieved, will allow us to extend our own capabilities to a very high level, and have more tasks performed in a predictable fashion. This is one of the motivations for this thesis. To address this problem, we have proposed a modular architecture for Reinforcement Learning computation and developed an implementation to have this architecture exercised. This software, that we call mHBP, is created in Python using Webots as an environment for the agent, and Neo4J, a graph database, as memory. mHBP takes the sensory data or other inputs, and produces, based on the body parts / tools that the agent has available, an output consisting of actions to perform. This thesis involves experimental design with several iterations, exploring a theoretical approach to RL based on graph databases. We conclude, with our work in this thesis, that it is possible to represent episodic data in a graph, and is also possible to interconnect Webots, Python and Neo4J to support a stable architecture for Reinforcement Learning. In this work we also find a way to search for policies using the Neo4J querying language: Cypher. Another key conclusion of this work is that state representation needs to have further research to find a state definition that enables policy search to produce more useful policies. The article “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” at Research Gate with doi 10.13140/RG.2.2.30323.76327 is an outcome of this thesis.Como podemos criar um agente que pense como nós humanos? Um agente que tenha propriocepção, motivação intrínseca, seja capaz de identificar ilusão, usar pequenas quantidades de energia, transferir conhecimento entre tarefas e evoluir? Este é o problema em que se foca esta tese. Ser capaz de criar uma peça de software que desempenhe tarefas como um ser humano é um objectivo que, se conseguido, nos permitirá estender as nossas capacidades a um nível muito alto, e conseguir realizar mais tarefas de uma forma previsível. Esta é uma das motivações desta tese. Para endereçar este problema, propomos uma arquitectura modular para computação de aprendizagem por reforço e desenvolvemos uma implementação para exercitar esta arquitetura. Este software, ao qual chamamos mHBP, foi criado em Python usando o Webots como um ambiente para o agente, e o Neo4J, uma base de dados de grafos, como memória. O mHBP recebe dados sensoriais ou outros inputs, e produz, baseado nas partes do corpo / ferramentas que o agente tem disponíveis, um output que consiste em ações a desempenhar. Uma boa parte desta tese envolve desenho experimental com diversas iterações, explorando uma abordagem teórica assente em bases de dados de grafos. Concluímos, com o trabalho nesta tese, que é possível representar episódios em um grafo, e que é, também, possível interligar o Webots, com o Python e o Neo4J para suportar uma arquitetura estável para a aprendizagem por reforço. Neste trabalho, também, encontramos uma forma de procurar políticas usando a linguagem de pesquisa do Neo4J: Cypher. Outra conclusão chave deste trabalho é que a representação de estados necessita de mais investigação para encontrar uma definição de estado que permita à pesquisa de políticas produzir políticas que sejam mais úteis. O artigo “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” no Research Gate com o doi 10.13140/RG.2.2.30323.76327 é um sub-produto desta tese

    Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

    Full text link
    Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces.Comment: 86 pages, 15 figure

    Hierarchies of reward machines

    Get PDF
    Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine whose edges encode subgoals of the task using high-level events. The structure of RMs enables the decomposition of a task into simpler and independently solvable subtasks that help tackle longhorizon and/or sparse reward tasks. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs, thus composing a hierarchy of RMs (HRM). We exploit HRMs by treating each call to an RM as an independently solvable subtask using the options framework, and describe a curriculum-based method to learn HRMs from traces observed by the agent. Our experiments reveal that exploiting a handcrafted HRM leads to faster convergence than with a flat HRM, and that learning an HRM is feasible in cases where its equivalent flat representation is not

    Temporal abstraction and generalisation in reinforcement learning

    Get PDF
    The ability of agents to generalise---to perform well when presented with previously unseen situations and data---is deeply important to the reliability, autonomy, and functionality of artificial intelligence systems. The generalisation test examines an agent's ability to reason over the world in an \emph{abstract} manner. In reinforcement learning problem settings, where an agent interacts continually with the environment, multiple notions of abstraction are possible. State-based abstraction allows for generalised behaviour across different \mccorrect{observations in the environment} that share similar properties. On the other hand, temporal abstraction is concerned with generalisation over an agent's own behaviour. This form of abstraction allows an agent to reason in a unified manner over different sequences of actions that may lead to similar outcomes. Data abstraction refers to the fact that agents may need to make use of information gleaned using data from one sampling distribution, while being evaluated on a different sampling distribution. This thesis develops algorithmic, theoretical, and empirical results on the questions of state abstraction, temporal abstraction, and finite-data generalisation performance for reinforcement learning algorithms. To focus on data abstraction, we explore an imitation learning setting. We provide a novel algorithm for completely offline imitation learning, as well as an empirical evaluation pipeline for offline reinforcement learning algorithms, encouraging honest and principled data complexity results and discouraging overfitting of algorithm hyperparameters to the environment on which test scores are reported. In order to more deeply explore state abstraction, we provide finite-sample analysis of target network performance---a key architectural element of deep reinforcement learning. By conducting our analysis in the fully nonlinear setting, we are able to help explain the strong performance of nonlinear neural-network based function approximation. Finally, we consider the question of temporal abstraction, providing an algorithm for semi-supervised (partially reward-free) learning of skills. This algorithm improves on the variational option discovery framework---solving a key under-specification problem in the domain---by defining skills which are specified in terms of a learned, reward-dependent state abstraction

    Effective reinforcement learning for collaborative multi-agent domains

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A Reinforcement Learning Technique For Enhancing Human Behavior Models In A Context-based Architecture

    Get PDF
    A reinforcement-learning technique for enhancing human behavior models in a context-based learning architecture is presented. Prior to the introduction of this technique, human models built and developed in a Context-Based reasoning framework lacked learning capabilities. As such, their performance and quality of behavior was always limited by what the subject matter expert whose knowledge is modeled was able to articulate or demonstrate. Results from experiments performed show that subject matter experts are prone to making errors and at times they lack information on situations that are inherently necessary for the human models to behave appropriately and optimally in those situations. The benefits of the technique presented is two fold; 1) It shows how human models built in a context-based framework can be modified to correctly reflect the knowledge learnt in a simulator; and 2) It presents a way for subject matter experts to verify and validate the knowledge they share. The results obtained from this research show that behavior models built in a context-based framework can be enhanced by learning and reflecting the constraints in the environment. From the results obtained, it was shown that after the models are enhanced, the agents performed better based on the metrics evaluated. Furthermore, after learning, the agent was shown to recognize unknown situations and behave appropriately in previously unknown situations. The overall performance and quality of behavior of the agent improved significantly

    Saturated fatty acids, linseed components and high amylose wheat in attenuation of diet-induced metabolic syndrome

    Get PDF
    Metabolic syndrome is a syndrome characterised by central obesity, dyslipidaemia, hypertension, fatty liver disease and insulin resistance that ultimately raises the risk of heart disease, diabetes, stroke, cancers and osteoarthritis. In combating metabolic syndrome, lifestyle changes are considered the most important initial steps which include a healthy, well-balanced diet and increased physical activity. Enrichment of beneficial fatty acids and incorporation of functional foods and bioactive nutrients are part of healthy dietary regimes in treating metabolic syndrome. These strategies provide options other than drug therapies that may cause adverse effects. Nevertheless, the effectiveness of these foods or bioactive nutrients in treating metabolic syndrome has yet to be fully explored. Therefore, in this thesis, I examined the physiological effects of individual saturated fatty acids (lauric, myristic, palmitic and stearic acid), linseed components (lignans, raw linseed and defatted linseed) and high amylose wheat (5% and 20%) using a validated diet-induced rat model of cardiovascular, liver and metabolic changes mimicking most of the changes in the human metabolic syndrome. Male Wistar rats fed with either diet containing 20% of lauric, myristic, palmitic, or stearic acid or corn-starch or high-carbohydrate, high-fat diet for 16 weeks showed that longer-chain saturated fatty acids (myristic, palmitic and stearic) and the mixture of stearic and trans fats in beef tallow produced obesity, in contrast to rats treated with lauric acid that exhibited low total fat mass, abdominal circumference and visceral adiposity index. Lauric acid supplemented rats also showed a normal cardiovascular and hepatic structure compared to other saturated fatty acids. This study suggests that replacing beef tallow with stearic and palmitic acids would show small improvements but replacement with lauric and possibly myristic acids in human diets would markedly attenuate the development of metabolic syndrome. Linseed is a rich source of plant lignans such as secoisolariciresinol diglucoside, as well as dietary fibre. Supplementation of lignan (0.03%) and defatted linseed (3%) in a high-carbohydrate, high-fat diet for eight weeks lowered body weight gain, total fat mass, improved cardiovascular functions, reduced hepatic steatosis and altered metabolic profiles, which can be regarded as beneficial to health, whereas raw linseed (5%) exacerbated adiposity with no changes in other metabolic biomarkers except for reduced systolic blood pressure. This study suggests that lignan and dietary fibre in defatted linseed could reduce the symptoms of metabolic syndrome. In contrast, positive physiological effects of raw linseed diminish possibly due to the properties of raw linseed that may pass through the intestine undigested, which means the nutritional benefits are unable to be realised. Another functional food described in this thesis is high amylose wheat flour. In this study, two dosages (5% and 20%) of high amylose wheat flour were supplemented in high-carbohydrate, high-fat diet for 8 weeks. Rats fed with 5% high amylose wheat flour showed no changes in the metabolic parameters. However, highcarbohydrate, high-fat diet-fed rats fed 20% high amylose wheat flour showed reduced body fat mass and increased lean mass despite no change in the body weight. The addition of 20% high amylose wheat in the diet was also associated with better glycaemic control, decreased insulin and leptin concentrations with cardioprotective and hepatoprotective effects. These effects are probably due to the increased resistant starch content in high amylose wheat, thus ameliorating the risk of developing metabolic syndrome. The studies in this thesis provided evidence that not all saturated fatty acids are equal, with lauric acid producing fewer pathophysiological changes in most parameters than other saturated fatty acids in this model of diet-induced metabolic syndrome. Studies from linseed components and high amylose wheat clearly indicate that these foods or food components have the potential to reverse most of the risk factors associated with metabolic syndrome. The most likely mechanisms of these food or food components are through the cardioprotective and hepatoprotective effects produced by anti-inflammatory responses

    Catalog 1987-1989

    Full text link
    corecore