432 research outputs found

    Improving Deep Reinforcement Learning in Minecraft with Action Advice

    Full text link
    Training deep reinforcement learning agents complex behaviors in 3D virtual environments requires significant computational resources. This is especially true in environments with high degrees of aliasing, where many states share nearly identical visual features. Minecraft is an exemplar of such an environment. We hypothesize that interactive machine learning IML, wherein human teachers play a direct role in training through demonstrations, critique, or action advice, may alleviate agent susceptibility to aliasing. However, interactive machine learning is only practical when the number of human interactions is limited, requiring a balance between human teacher effort and agent performance. We conduct experiments with two reinforcement learning algorithms which enable human teachers to give action advice, Feedback Arbitration and Newtonian Action Advice, under visual aliasing conditions. To assess potential cognitive load per advice type, we vary the accuracy and frequency of various human action advice techniques. Training efficiency, robustness against infrequent and inaccurate advisor input, and sensitivity to aliasing are examined

    深層強化学習における多次元混合行動空間のための行動分岐アーキテクチャの改善

    Get PDF
    学位の種別: 修士University of Tokyo(東京大学

    Game-based learning - teaching artificial intelligence to play Minecraft : a systematic literature review

    Get PDF
    Artificial Intelligence (AI) are machines designed to think and behave as humans would. Taking AI and placing it into a virtual world they become known as AI agents which uses the knowledge it gained from training to perform tasks in the world. AI agents in the virtual world has only been able to perform a narrow set of tasks with specialised models in environments with limited complexity and diversity. A rich world that requires an agent to continuously learn from and adapt to a wide variety of open-ended tasks and use previously gained knowledge to determine the next course of action will render the agent incapable. In order to investigate the AI teaching methods applied to instruct the agent to perform basic tasks in Minecraft in order to identify which AI teaching methods will yield the best results, a systematic literature review was conducted by extracting 57 papers and identifying themes and sub-themes that suited AI agent training methods and functions. This was to discover wat AI training methods can be implemented to enable an agent to perform tasks in a complex and rich world, contributing to game-based learning. The study found that a well-integrated Reinforcement Learning (RL) method with an effective reward system equipped the agent with the necessary knowledge to be able to perform tasks on a more complex level. A list of unique methods was integrated with RL such as Newtonian Action Advice (NAA), Behavioural Cloning (BC), VideoPreTraining (VPT), human demonstrations, and natural language commands to achieve a certain goal. This meant that AI agents can be taught to perform open ended tasks in a complex environment by setting up a well thought out framework on how to teach the agent in various areas leading to the possibility to incorporate those teachings into the real world through gamebased learning.https://easychair.org/publications/EPiC/Computingam2024InformaticsSDG-09: Industry, innovation and infrastructur

    Programació d'un agent intel·ligent capaç de sobreviure en un videojoc "Sandbox" de 3dimensions

    Get PDF
    El propòsit d’aquest projecte és la investigació i subseqüent programació i implementació d’un agent intel·ligent capaç de jugar a un videojoc de supervivència. El projecte consisteix en dues parts principals. La primera se centra en l’estudi del camp de la intel·ligència artificial espontània, supervisada i no supervisada, entenent els mètodes comuns i extraient informació crucial amb la intenció d’implementar-los, si és possible, en el nostre entorn. La segona consisteix a emprar tot el coneixement adquirit per aconseguir fer l’agent el més adaptat possible a l’entorn on se l’introdueixi. No s’intentarà obtenir una perfecta optimització de tasques, ja que aquesta disciplina ja s’ha explorat amb profunditat i de fer-ho el projecte es limitaria al seguiment d’un camí ja recorregut. En canvi, el que s’intentarà de debò serà fer que l’agent evolucioni de la manera més autònoma possible i esperar que sigui capaç d’innovar a l’hora de trobar solucions originals per als problemes que li presentem. L’agent hauria de ser capaç d’analitzar el seu voltant i reconèixer-hi altres entitats o agents en les seves respectives localitzacions. A partir d’aquestes observacions i en conjunt amb l’experiència adquirida prèviament, haurà de decidir quines accions prendre utilitzant únicament controls convencionals. S’intentarà “imitar” el procés d’aprenentatge dels éssers vius i no necessàriament seguir aproximacions purament matemàtiques. L’objectiu últim és doncs introduir-se en el camp dels agents intel·ligents en entorns de 3 dimensions amb el propòsit de fer-ne un d’indistingible a les persones.El propósito de este proyecto es la investigación y subsecuente programación e implementación de un agente inteligente capaz de jugar a un videojuego de supervivencia. El proyecto consiste de dos partes principales. La primera se centra en el estudio del campo de la inteligencia artificial espontánea, supervisada y no supervisada, entender los métodos más comunes y extraer información crucial con la intención de implementarlos, a ser posible, en nuestro entorno. La segunda consiste en usar todo el conocimiento adquirido para conseguir hacer el agente lo más adaptado posible al entorno donde se le introduzca. No se intentará conseguir ninguna perfecta optimización de tareas, ya que esta disciplina ya se ha explorado en profundidad y de hacerlo el proyecto se limitaría al seguimiento de un camino ya recorrido. En cambio, lo que se intentará en realidad será hacer que el agente evolucione de la manera más autónoma posible y esperar que sea capaz de innovar a la hora de encontrar soluciones originales a los problemas que se le presenten. El agente debería ser capaz de analizar sus alrededores y reconocer otras entidades o agentes en sus respectivas localizaciones. A partir de esas observaciones y en conjunto con la experiencia adquirida con anterioridad, deberá decidir qué acciones tomar utilizando únicamente controles convencionales. Se intentará “imitar” el proceso de aprendizaje de los seres vivos y no necesariamente se seguirán aproximaciones puramente matemáticas. El objetivo último es entonces introducirse en el campo de los agentes inteligentes en entornos de 3 dimensiones con el propósito de hacer uno indistinguible a las personas.The purpose of this project is the investigation and subsequent programming and deployment of an intelligent agent capable of playing a survival video game. It will consist of two main parts. The first focus will be to dive into the territory of spontaneous, unsupervised Artificial Intelligence. Understanding the common methods and extracting insights with the intent of hopefully implementing them into our environment. The second part will consist in using the gathered knowledge for making the agent as adapted as possible to the environment it is put into. We will not be aiming for a perfect optimization of tasks as this discipline has already been largely explored, and thus this project would be limited to the following of an already established path. We will instead try to make the agent evolve as autonomously as possible, and expect it to come up with unforeseen and not necessarily optimal solutions to the problems we present it with. The agent should be capable of analyzing its surroundings and recognizing other entities or agents in their respective locations. From those and together with the experience gathered previously, it will have to decide which actions to take by using conventional controls. We intend it to “mimic” the learning process of living beings and not necessarily follow strictly mathematical approaches. Hence, the final goal is to get inside the field of smart agents in 3d virtual environments with the purpose of making one closer to being indistinguishable from humans

    Efficient Deep Reinforcement Learning via Planning, Generalization, and Improved Exploration

    Full text link
    Reinforcement learning (RL) is a general-purpose machine learning framework, which considers an agent that makes sequential decisions in an environment to maximize its reward. Deep reinforcement learning (DRL) approaches use deep neural networks as non-linear function approximators that parameterize policies or value functions directly from raw observations in RL. Although DRL approaches have been shown to be successful on many challenging RL benchmarks, much of the prior work has mainly focused on learning a single task in a model-free setting, which is often sample-inefficient. On the other hand, humans have abilities to acquire knowledge by learning a model of the world in an unsupervised fashion, use such knowledge to plan ahead for decision making, transfer knowledge between many tasks, and generalize to previously unseen circumstances from the pre-learned knowledge. Developing such abilities are some of the fundamental challenges for building RL agents that can learn as efficiently as humans. As a step towards developing the aforementioned capabilities in RL, this thesis develops new DRL techniques to address three important challenges in RL: 1) planning via prediction, 2) rapidly generalizing to new environments and tasks, and 3) efficient exploration in complex environments. The first part of the thesis discusses how to learn a dynamics model of the environment using deep neural networks and how to use such a model for planning in complex domains where observations are high-dimensional. Specifically, we present neural network architectures for action-conditional video prediction and demonstrate improved exploration in RL. In addition, we present a neural network architecture that performs lookahead planning by predicting the future only in terms of rewards and values without predicting observations. We then discuss why this approach is beneficial compared to conventional model-based planning approaches. The second part of the thesis considers generalization to unseen environments and tasks. We first introduce a set of cognitive tasks in a 3D environment and present memory-based DRL architectures that generalize better to previously unseen 3D environments compared to existing baselines. In addition, we introduce a new multi-task RL problem where the agent should learn to execute different tasks depending on given instructions and generalize to new instructions in a zero-shot fashion. We present a new hierarchical DRL architecture that learns to generalize over previously unseen task descriptions with minimal prior knowledge. The third part of the thesis discusses how exploiting past experiences can indirectly drive deep exploration and improve sample-efficiency. In particular, we propose a new off-policy learning algorithm, called self-imitation learning, which learns a policy to reproduce past good experiences. We empirically show that self-imitation learning indirectly encourages the agent to explore reasonably good state spaces and thus significantly improves sample-efficiency on RL domains where exploration is challenging. Overall, the main contribution of this thesis are to explore several fundamental challenges in RL in the context of DRL and develop new DRL architectures and algorithms to address such challenges. This allows us to understand how deep learning can be used to improve sample efficiency, and thus come closer to human-like learning abilities.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145829/1/junhyuk_1.pd

    Towards Goal-Oriented Agents for Evolving Problems Observed via Conversation

    Full text link
    The objective of this work is to train a chatbot capable of solving evolving problems through conversing with a user about a problem the chatbot cannot directly observe. The system consists of a virtual problem (in this case a simple game), a simulated user capable of answering natural language questions that can observe and perform actions on the problem, and a Deep Q-Network (DQN)-based chatbot architecture. The chatbot is trained with the goal of solving the problem through dialogue with the simulated user using reinforcement learning. The contributions of this paper are as follows: a proposed architecture to apply a conversational DQN-based agent to evolving problems, an exploration of training methods such as curriculum learning on model performance and the effect of modified reward functions in the case of increasing environment complexity.Comment: 15 pages, 7 figure

    Discovering logical knowledge in non-symbolic domains

    Get PDF
    Deep learning and symbolic artificial intelligence remain the two main paradigms in Artificial Intelligence (AI), each presenting their own strengths and weaknesses. Artificial agents should integrate both of these aspects of AI in order to show general intelligence and solve complex problems in real-world scenarios; similarly to how humans use both the analytical left side and the intuitive right side of their brain in their lives. However, one of the main obstacles hindering this integration is the Symbol Grounding Problem [144], which is the capacity to map physical world observations to a set of symbols. In this thesis, we combine symbolic reasoning and deep learning in order to better represent and reason with abstract knowledge. In particular, we focus on solving non-symbolic-state Reinforcement Learning environments using a symbolic logical domain. We consider different configurations: (i) unknown knowledge of both the symbol grounding function and the symbolic logical domain, (ii) unknown knowledge of the symbol grounding function and prior knowledge of the domain, (iii) imperfect knowledge of the symbols grounding function and unknown knowledge of the domain. We develop algorithms and neural network architectures that are general enough to be applied to different kinds of environments, which we test on both continuous-state control problems and image-based environments. Specifically, we develop two kinds of architectures: one for Markovian RL tasks and one for non-Markovian RL domains. The first is based on model-based RL and representation learning, and is inspired by the substantial prior work in state abstraction for RL [115]. The second is mainly based on recurrent neural networks and continuous relaxations of temporal logic domains. In particular, the first approach extracts a symbolic STRIPS-like abstraction for control problems. For the second approach, we explore connections between recurrent neural networks and finite state machines, and we define Visual Reward Machines, an extension to non-symbolic domains of Reward Machines [27], which are a popular approach to non-Markovian RL tasks
    corecore