1,916 research outputs found

    my Human Brain Project (mHBP)

    Get PDF
    How can we make an agent that thinks like us humans? An agent that can have proprioception, intrinsic motivation, identify deception, use small amounts of energy, transfer knowledge between tasks and evolve? This is the problem that this thesis is focusing on. Being able to create a piece of software that can perform tasks like a human being, is a goal that, if achieved, will allow us to extend our own capabilities to a very high level, and have more tasks performed in a predictable fashion. This is one of the motivations for this thesis. To address this problem, we have proposed a modular architecture for Reinforcement Learning computation and developed an implementation to have this architecture exercised. This software, that we call mHBP, is created in Python using Webots as an environment for the agent, and Neo4J, a graph database, as memory. mHBP takes the sensory data or other inputs, and produces, based on the body parts / tools that the agent has available, an output consisting of actions to perform. This thesis involves experimental design with several iterations, exploring a theoretical approach to RL based on graph databases. We conclude, with our work in this thesis, that it is possible to represent episodic data in a graph, and is also possible to interconnect Webots, Python and Neo4J to support a stable architecture for Reinforcement Learning. In this work we also find a way to search for policies using the Neo4J querying language: Cypher. Another key conclusion of this work is that state representation needs to have further research to find a state definition that enables policy search to produce more useful policies. The article “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” at Research Gate with doi 10.13140/RG.2.2.30323.76327 is an outcome of this thesis.Como podemos criar um agente que pense como nós humanos? Um agente que tenha propriocepção, motivação intrínseca, seja capaz de identificar ilusão, usar pequenas quantidades de energia, transferir conhecimento entre tarefas e evoluir? Este é o problema em que se foca esta tese. Ser capaz de criar uma peça de software que desempenhe tarefas como um ser humano é um objectivo que, se conseguido, nos permitirá estender as nossas capacidades a um nível muito alto, e conseguir realizar mais tarefas de uma forma previsível. Esta é uma das motivações desta tese. Para endereçar este problema, propomos uma arquitectura modular para computação de aprendizagem por reforço e desenvolvemos uma implementação para exercitar esta arquitetura. Este software, ao qual chamamos mHBP, foi criado em Python usando o Webots como um ambiente para o agente, e o Neo4J, uma base de dados de grafos, como memória. O mHBP recebe dados sensoriais ou outros inputs, e produz, baseado nas partes do corpo / ferramentas que o agente tem disponíveis, um output que consiste em ações a desempenhar. Uma boa parte desta tese envolve desenho experimental com diversas iterações, explorando uma abordagem teórica assente em bases de dados de grafos. Concluímos, com o trabalho nesta tese, que é possível representar episódios em um grafo, e que é, também, possível interligar o Webots, com o Python e o Neo4J para suportar uma arquitetura estável para a aprendizagem por reforço. Neste trabalho, também, encontramos uma forma de procurar políticas usando a linguagem de pesquisa do Neo4J: Cypher. Outra conclusão chave deste trabalho é que a representação de estados necessita de mais investigação para encontrar uma definição de estado que permita à pesquisa de políticas produzir políticas que sejam mais úteis. O artigo “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” no Research Gate com o doi 10.13140/RG.2.2.30323.76327 é um sub-produto desta tese

    Shimyureta to jikki o mochiita haiburiddo-gata kikai gakushuho ni kansuru kenkyu

    Get PDF
    制度:新 ; 報告番号:甲2816号 ; 学位の種類:博士(工学) ; 授与年月日:2009/2/25 ; 早大学位記番号:新503

    Combining intention and emotional state inference in a dynamic neural field architecture for human-robot joint action

    Get PDF
    We report on our approach towards creating socially intelligent robots, which is heavily inspired by recent experimental findings about the neurocognitive mechanisms underlying action and emotion understanding in humans. Our approach uses neuro-dynamics as a theoretical language to model cognition, emotional states, decision making and action. The control architecture is formalized by a coupled system of dynamic neural fields representing a distributed network of local but connected neural populations. Different pools of neurons encode relevant information in the form of self-sustained activation patterns, which are triggered by input from connected populations and evolve continuously in time. The architecture implements a dynamic and flexible context-dependent mapping from observed hand and facial actions of the human onto adequate complementary behaviors of the robot that take into account the inferred goal and inferred emotional state of the co-actor. The dynamic control architecture was validated in multiple scenarios in which an anthropomorphic robot and a human operator assemble a toy object from its components. The scenarios focus on the robot’s capacity to understand the human’s actions, and emotional states, detect errors and adapt its behavior accordingly by adjusting its decisions and movements during the execution of the task.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was possible in part by the funding of research grants from the Portuguese Foundation for Science and Technology (grant numbers SFRH/BD/48527/2008, SFRH/BPD/71874/2010, SFRH/BD/81334/2011), and with funding from FP6-IST2 EU-IP Project JAST (project number 003747) and FP7 Marie Curie ITN Neural Engineering Transformative Technologies NETT (project number 289146).info:eu-repo/semantics/publishedVersio

    Off-line simulation inspires insight: a neurodynamics approach to efficient robot task learning

    Get PDF
    There is currently an increasing demand for robots able to acquire the sequential organization of tasks from social learning interactions with ordinary people. Interactive learning-by-demonstration and communication is a promising research topic in current robotics research. However, the efficient acquisition of generalized task representations that allow the robot to adapt to different users and contexts is a major challenge. In this paper, we present a dynamic neural field (DNF) model that is inspired by the hypothesis that the nervous system uses the off-line re-activation of initial memory traces to incrementally incorporate new information into structured knowledge. To achieve this, the model combines fast activation-based learning to robustly represent sequential information from single task demonstrations with slower, weight-based learning during internal simulations to establish longer-term associations between neural populations representing individual subtasks. The efficiency of the learning process is tested in an assembly paradigm in which the humanoid robot ARoS learns to construct a toy vehicle from its parts. User demonstrations with different serial orders together with the correction of initial prediction errors allow the robot to acquire generalized task knowledge about possible serial orders and the longer term dependencies between subgoals in very few social learning interactions. This success is shown in a joint action scenario in which ARoS uses the newly acquired assembly plan to construct the toy together with a human partner.The work was funded by FCT - Fundacao para a Ciencia e Tecnologia, through the PhD Grants SFRH/BD/48529/2008 and SFRH/BD/41179/2007 and Project NETT: Neural Engineering Transformative Technologies, EU-FP7 ITN (nr. 289146) and the FCT-Research Center CMAT (PEst-OE/MAT/UI0013/2014)

    Autonomous Driving with Deep Reinforcement Learning

    Get PDF
    The researcher developed an autonomous driving simulation by training an end-to-end policy model using deep reinforcement learning algorithms in the Gym-duckietown virtual environment. The control strategy of the model was designed for the lane-following task. Several reinforcement learning algorithms were implemented and the SAC algorithm was chosen to train a non-end-to-end model with the information provided by the environment such as speed as input values, as well as an end-to-end model with images captured by the agent's front camera as input. In this paper, the researcher compared the advantages and disadvantages of the two models using kinetic parameters in the environment and conducted a series of experiments on the control strategy of the end-to-end model to explore the effects of different environmental parameters or reward functions on the models.:CHAPTER 1 INTRODUCTION 1 1.1 AUTONOMOUS DRIVING OVERVIEW 1 1.2 RESEARCH QUESTIONS AND METHODS 3 1.2.1 Research Questions 3 1.2.2 Research Methods 4 1.3 PAPER STRUCTURE 5 CHAPTER 2 RESEARCH BACKGROUND 7 2.1 RESEARCH STATUS 7 2.2 THEORETICAL BASIS 8 2.2.1 Machine Learning 8 2.2.2 Deep Learning 9 2.2.3 Reinforcement Learning 11 2.2.4 Deep Reinforcement Learning 14 CHAPTER 3 METHOD 15 3.1 SIMULATION PLATFORM 16 3.2 CONTROL TASK 17 3.3 OBSERVATION SPACE 18 3.3.1 Information as Observation (Non-end-to-end) 19 3.3.2 Images as Observation (End-to-end) 20 3.4 ACTION SPACE 22 3.5 ALGORITHM 23 3.5.1 Mathematical Foundations 23 3.5.2 Policy Iteration 25 3.6 POLICY ARCHITECTURE 25 3.6.1 Network Architecture for Non-end-to-end Model 26 3.6.2 Network Architecture for End-to-end Model 28 3.7 REWARD SHAPING 29 3.7.1 Calculation of Speed-based Reward Function 30 3.7.2 Calculation of the reward function based on the position of the agent relative to the right lane 31 CHAPTER 4 TRAINING PROCESS 33 4.1 TRAINING PROCESS OF NON-END-TO-END MODEL 34 4.2 TRAINING PROCESS OF END-TO-END MODEL 35 CHAPTER 5 RESULT 38 CHAPTER 6 TEST AND EVALUATION 41 6.1 EVALUATION OF END-TO-END MODEL 43 6.1.1 Speed Tests in Two Scenarios 43 6.1.2 Lateral Deviation between the Agent and the Right Lane’s Centerline 44 6.1.3 Orientation Deviation between the Agent and the Right Lane’s Centerline 45 6.2 COMPARISON OF THE END-TO-END MODEL TO TWO BASELINES IN SIMULATION 46 6.2.1 Comparison with Non-end-to-end Baseline 47 6.2.2 Comparison with PD Baseline 51 6.3 TEST THE EFFECT OF DIFFERENT WEIGHTS ASSIGNMENTS ON THE END-TO-END MODEL 53 CHAPTER 7 CONCLUSION 57Der Forscher entwickelte eine autonome Fahrsimulation, indem er ein End-to-End-Regelungsmodell mit Hilfe von Deep Reinforcement Learning-Algorithmen in der virtuellen Umgebung von Gym-duckietown trainierte. Die Kontrollstrategie des Modells wurde für die Aufgabe des Spurhaltens entwickelt. Es wurden mehrere Verstärkungslernalgorithmen implementiert, und der SAC-Algorithmus wurde ausgewählt, um ein Nicht-End-to-End-Modell mit den von der Umgebung bereitgestellten Informationen wie Geschwindigkeit als Eingabewerte sowie ein End-to-End-Modell mit den von der Frontkamera des Agenten aufgenommenen Bildern als Eingabe zu trainieren. In diesem Beitrag verglich der Forscher die Vor- und Nachteile der beiden Modelle unter Verwendung kinetischer Parameter in der Umgebung und führte eine Reihe von Experimenten zur Kontrollstrategie des End-to-End-Modells durch, um die Auswirkungen verschiedener Umgebungsparameter oder Belohnungsfunktionen auf die Modelle zu untersuchen.:CHAPTER 1 INTRODUCTION 1 1.1 AUTONOMOUS DRIVING OVERVIEW 1 1.2 RESEARCH QUESTIONS AND METHODS 3 1.2.1 Research Questions 3 1.2.2 Research Methods 4 1.3 PAPER STRUCTURE 5 CHAPTER 2 RESEARCH BACKGROUND 7 2.1 RESEARCH STATUS 7 2.2 THEORETICAL BASIS 8 2.2.1 Machine Learning 8 2.2.2 Deep Learning 9 2.2.3 Reinforcement Learning 11 2.2.4 Deep Reinforcement Learning 14 CHAPTER 3 METHOD 15 3.1 SIMULATION PLATFORM 16 3.2 CONTROL TASK 17 3.3 OBSERVATION SPACE 18 3.3.1 Information as Observation (Non-end-to-end) 19 3.3.2 Images as Observation (End-to-end) 20 3.4 ACTION SPACE 22 3.5 ALGORITHM 23 3.5.1 Mathematical Foundations 23 3.5.2 Policy Iteration 25 3.6 POLICY ARCHITECTURE 25 3.6.1 Network Architecture for Non-end-to-end Model 26 3.6.2 Network Architecture for End-to-end Model 28 3.7 REWARD SHAPING 29 3.7.1 Calculation of Speed-based Reward Function 30 3.7.2 Calculation of the reward function based on the position of the agent relative to the right lane 31 CHAPTER 4 TRAINING PROCESS 33 4.1 TRAINING PROCESS OF NON-END-TO-END MODEL 34 4.2 TRAINING PROCESS OF END-TO-END MODEL 35 CHAPTER 5 RESULT 38 CHAPTER 6 TEST AND EVALUATION 41 6.1 EVALUATION OF END-TO-END MODEL 43 6.1.1 Speed Tests in Two Scenarios 43 6.1.2 Lateral Deviation between the Agent and the Right Lane’s Centerline 44 6.1.3 Orientation Deviation between the Agent and the Right Lane’s Centerline 45 6.2 COMPARISON OF THE END-TO-END MODEL TO TWO BASELINES IN SIMULATION 46 6.2.1 Comparison with Non-end-to-end Baseline 47 6.2.2 Comparison with PD Baseline 51 6.3 TEST THE EFFECT OF DIFFERENT WEIGHTS ASSIGNMENTS ON THE END-TO-END MODEL 53 CHAPTER 7 CONCLUSION 5

    Computer based laboratory simulation in maritime education

    Get PDF

    Machine learning for optical fiber communication systems: An introduction and overview

    Get PDF
    Optical networks generate a vast amount of diagnostic, control and performance monitoring data. When information is extracted from this data, reconfigurable network elements and reconfigurable transceivers allow the network to adapt both to changes in the physical infrastructure but also changing traffic conditions. Machine learning is emerging as a disruptive technology for extracting useful information from this raw data to enable enhanced planning, monitoring and dynamic control. We provide a survey of the recent literature and highlight numerous promising avenues for machine learning applied to optical networks, including explainable machine learning, digital twins and approaches in which we embed our knowledge into the machine learning such as physics-informed machine learning for the physical layer and graph-based machine learning for the networking layer

    A comprehensive survey on reinforcement-learning-based computation offloading techniques in Edge Computing Systems

    Get PDF
    Producción CientíficaIn recent years, the number of embedded computing devices connected to the Internet has exponentially increased. At the same time, new applications are becoming more complex and computationally demanding, which can be a problem for devices, especially when they are battery powered. In this context, the concepts of computation offloading and edge computing, which allow applications to be fully or partially offloaded and executed on servers close to the devices in the network, have arisen and received increasing attention. Then, the design of algorithms to make the decision of which applications or tasks should be offloaded, and where to execute them, is crucial. One of the options that has been gaining momentum lately is the use of Reinforcement Learning (RL) and, in particular, Deep Reinforcement Learning (DRL), which enables learning optimal or near-optimal offloading policies adapted to each particular scenario. Although the use of RL techniques to solve the computation offloading problem in edge systems has been covered by some surveys, it has been done in a limited way. For example, some surveys have analysed the use of RL to solve various networking problems, with computation offloading being one of them, but not the primary focus. Other surveys, on the other hand, have reviewed techniques to solve the computation offloading problem, being RL just one of the approaches considered. To the best of our knowledge, this is the first survey that specifically focuses on the use of RL and DRL techniques for computation offloading in edge computing system. We present a comprehensive and detailed survey, where we analyse and classify the research papers in terms of use cases, network and edge computing architectures, objectives, RL algorithms, decision-making approaches, and time-varying characteristics considered in the analysed scenarios. In particular, we include a series of tables to help researchers identify relevant papers based on specific features, and analyse which scenarios and techniques are most frequently considered in the literature. Finally, this survey identifies a number of research challenges, future directions and areas for further study.Consejería de Educación de la Junta de Castilla y León y FEDER (VA231P20)Ministerio de Ciencia e Innovación y Agencia Estatal de Investigación (Proyecto PID2020-112675RB-C42, PID2021-124463OBI00 y RED2018-102585-T, financiados por MCIN/AEI/10.13039/501100011033
    corecore