432 research outputs found
Improving Deep Reinforcement Learning in Minecraft with Action Advice
Training deep reinforcement learning agents complex behaviors in 3D virtual
environments requires significant computational resources. This is especially
true in environments with high degrees of aliasing, where many states share
nearly identical visual features. Minecraft is an exemplar of such an
environment. We hypothesize that interactive machine learning IML, wherein
human teachers play a direct role in training through demonstrations, critique,
or action advice, may alleviate agent susceptibility to aliasing. However,
interactive machine learning is only practical when the number of human
interactions is limited, requiring a balance between human teacher effort and
agent performance. We conduct experiments with two reinforcement learning
algorithms which enable human teachers to give action advice, Feedback
Arbitration and Newtonian Action Advice, under visual aliasing conditions. To
assess potential cognitive load per advice type, we vary the accuracy and
frequency of various human action advice techniques. Training efficiency,
robustness against infrequent and inaccurate advisor input, and sensitivity to
aliasing are examined
深層強化学習における多次元混合行動空間のための行動分岐アーキテクチャの改善
学位の種別: 修士University of Tokyo(東京大学
Game-based learning - teaching artificial intelligence to play Minecraft : a systematic literature review
Artificial Intelligence (AI) are machines designed to think and behave as humans would.
Taking AI and placing it into a virtual world they become known as AI agents which uses
the knowledge it gained from training to perform tasks in the world. AI agents in the
virtual world has only been able to perform a narrow set of tasks with specialised models
in environments with limited complexity and diversity. A rich world that requires an
agent to continuously learn from and adapt to a wide variety of open-ended tasks and use
previously gained knowledge to determine the next course of action will render the agent
incapable. In order to investigate the AI teaching methods applied to instruct the agent to
perform basic tasks in Minecraft in order to identify which AI teaching methods will yield
the best results, a systematic literature review was conducted by extracting 57 papers and
identifying themes and sub-themes that suited AI agent training methods and functions.
This was to discover wat AI training methods can be implemented to enable an agent to
perform tasks in a complex and rich world, contributing to game-based learning. The
study found that a well-integrated Reinforcement Learning (RL) method with an effective
reward system equipped the agent with the necessary knowledge to be able to perform
tasks on a more complex level. A list of unique methods was integrated with RL such as
Newtonian Action Advice (NAA), Behavioural Cloning (BC), VideoPreTraining (VPT),
human demonstrations, and natural language commands to achieve a certain goal. This
meant that AI agents can be taught to perform open ended tasks in a complex environment
by setting up a well thought out framework on how to teach the agent in various areas
leading to the possibility to incorporate those teachings into the real world through gamebased
learning.https://easychair.org/publications/EPiC/Computingam2024InformaticsSDG-09: Industry, innovation and infrastructur
Programació d'un agent intel·ligent capaç de sobreviure en un videojoc "Sandbox" de 3dimensions
El propòsit d’aquest projecte és la investigació i subseqüent programació i
implementació d’un agent intel·ligent capaç de jugar a un videojoc de supervivència.
El projecte consisteix en dues parts principals. La primera se centra en l’estudi del
camp de la intel·ligència artificial espontània, supervisada i no supervisada, entenent
els mètodes comuns i extraient informació crucial amb la intenció d’implementar-los,
si és possible, en el nostre entorn. La segona consisteix a emprar tot el coneixement
adquirit per aconseguir fer l’agent el més adaptat possible a l’entorn on se
l’introdueixi.
No s’intentarà obtenir una perfecta optimització de tasques, ja que aquesta disciplina
ja s’ha explorat amb profunditat i de fer-ho el projecte es limitaria al seguiment d’un
camí ja recorregut. En canvi, el que s’intentarà de debò serà fer que l’agent
evolucioni de la manera més autònoma possible i esperar que sigui capaç d’innovar
a l’hora de trobar solucions originals per als problemes que li presentem.
L’agent hauria de ser capaç d’analitzar el seu voltant i reconèixer-hi altres entitats o
agents en les seves respectives localitzacions. A partir d’aquestes observacions i en
conjunt amb l’experiència adquirida prèviament, haurà de decidir quines accions
prendre utilitzant únicament controls convencionals. S’intentarà “imitar” el procés
d’aprenentatge dels éssers vius i no necessàriament seguir aproximacions purament
matemàtiques.
L’objectiu últim és doncs introduir-se en el camp dels agents intel·ligents en entorns
de 3 dimensions amb el propòsit de fer-ne un d’indistingible a les persones.El propósito de este proyecto es la investigación y subsecuente programación e
implementación de un agente inteligente capaz de jugar a un videojuego de
supervivencia.
El proyecto consiste de dos partes principales. La primera se centra en el estudio
del campo de la inteligencia artificial espontánea, supervisada y no supervisada,
entender los métodos más comunes y extraer información crucial con la intención de
implementarlos, a ser posible, en nuestro entorno. La segunda consiste en usar todo
el conocimiento adquirido para conseguir hacer el agente lo más adaptado posible al
entorno donde se le introduzca.
No se intentará conseguir ninguna perfecta optimización de tareas, ya que esta
disciplina ya se ha explorado en profundidad y de hacerlo el proyecto se limitaría al
seguimiento de un camino ya recorrido. En cambio, lo que se intentará en realidad
será hacer que el agente evolucione de la manera más autónoma posible y esperar
que sea capaz de innovar a la hora de encontrar soluciones originales a los
problemas que se le presenten.
El agente debería ser capaz de analizar sus alrededores y reconocer otras
entidades o agentes en sus respectivas localizaciones. A partir de esas
observaciones y en conjunto con la experiencia adquirida con anterioridad, deberá
decidir qué acciones tomar utilizando únicamente controles convencionales. Se
intentará “imitar” el proceso de aprendizaje de los seres vivos y no necesariamente
se seguirán aproximaciones puramente matemáticas.
El objetivo último es entonces introducirse en el campo de los agentes inteligentes
en entornos de 3 dimensiones con el propósito de hacer uno indistinguible a las
personas.The purpose of this project is the investigation and subsequent programming and
deployment of an intelligent agent capable of playing a survival video game.
It will consist of two main parts. The first focus will be to dive into the territory of
spontaneous, unsupervised Artificial Intelligence. Understanding the common
methods and extracting insights with the intent of hopefully implementing them into
our environment. The second part will consist in using the gathered knowledge for
making the agent as adapted as possible to the environment it is put into.
We will not be aiming for a perfect optimization of tasks as this discipline has already
been largely explored, and thus this project would be limited to the following of an
already established path. We will instead try to make the agent evolve as
autonomously as possible, and expect it to come up with unforeseen and not
necessarily optimal solutions to the problems we present it with.
The agent should be capable of analyzing its surroundings and recognizing other
entities or agents in their respective locations. From those and together with the
experience gathered previously, it will have to decide which actions to take by using
conventional controls. We intend it to “mimic” the learning process of living beings
and not necessarily follow strictly mathematical approaches.
Hence, the final goal is to get inside the field of smart agents in 3d virtual
environments with the purpose of making one closer to being indistinguishable from
humans
Efficient Deep Reinforcement Learning via Planning, Generalization, and Improved Exploration
Reinforcement learning (RL) is a general-purpose machine learning framework, which considers an agent that makes sequential decisions in an environment to maximize its reward. Deep reinforcement learning (DRL) approaches use deep neural networks as non-linear function approximators that parameterize policies or value functions directly from raw observations in RL.
Although DRL approaches have been shown to be successful on many challenging RL benchmarks, much of the prior work has mainly focused on learning a single task in a model-free setting, which is often sample-inefficient. On the other hand, humans have abilities to acquire knowledge by learning a model of the world in an unsupervised fashion, use such knowledge to plan ahead for decision making, transfer knowledge between many tasks, and generalize to previously unseen circumstances from the pre-learned knowledge. Developing such abilities are some of the fundamental challenges for building RL agents that can learn as efficiently as humans.
As a step towards developing the aforementioned capabilities in RL, this thesis develops new DRL techniques to address three important challenges in RL: 1) planning via prediction, 2) rapidly generalizing to new environments and tasks, and 3) efficient exploration in complex environments.
The first part of the thesis discusses how to learn a dynamics model of the environment using deep neural networks and how to use such a model for planning in complex domains where observations are high-dimensional. Specifically, we present neural network architectures for action-conditional video prediction and demonstrate improved exploration in RL. In addition, we present a neural network architecture that performs lookahead planning by predicting the future only in terms of rewards and values without predicting observations. We then discuss why this approach is beneficial compared to conventional model-based planning approaches.
The second part of the thesis considers generalization to unseen environments and tasks. We first introduce a set of cognitive tasks in a 3D environment and present memory-based DRL architectures that generalize better to previously unseen 3D environments compared to existing baselines. In addition, we introduce a new multi-task RL problem where the agent should learn to execute different tasks depending on given instructions and generalize to new instructions in a zero-shot fashion. We present a new hierarchical DRL architecture that learns to generalize over previously unseen task descriptions with minimal prior knowledge.
The third part of the thesis discusses how exploiting past experiences can indirectly drive deep exploration and improve sample-efficiency. In particular, we propose a new off-policy learning algorithm, called self-imitation learning, which learns a policy to reproduce past good experiences. We empirically show that self-imitation learning indirectly encourages the agent to explore reasonably good state spaces and thus significantly improves sample-efficiency on RL domains where exploration is challenging.
Overall, the main contribution of this thesis are to explore several fundamental challenges in RL in the context of DRL and develop new DRL architectures and algorithms to address such challenges. This allows us to understand how deep learning can be used to improve sample efficiency, and thus come closer to human-like learning abilities.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145829/1/junhyuk_1.pd
Towards Goal-Oriented Agents for Evolving Problems Observed via Conversation
The objective of this work is to train a chatbot capable of solving evolving
problems through conversing with a user about a problem the chatbot cannot
directly observe. The system consists of a virtual problem (in this case a
simple game), a simulated user capable of answering natural language questions
that can observe and perform actions on the problem, and a Deep Q-Network
(DQN)-based chatbot architecture. The chatbot is trained with the goal of
solving the problem through dialogue with the simulated user using
reinforcement learning. The contributions of this paper are as follows: a
proposed architecture to apply a conversational DQN-based agent to evolving
problems, an exploration of training methods such as curriculum learning on
model performance and the effect of modified reward functions in the case of
increasing environment complexity.Comment: 15 pages, 7 figure
Discovering logical knowledge in non-symbolic domains
Deep learning and symbolic artificial intelligence remain the two main paradigms in Artificial Intelligence (AI), each presenting their own strengths and weaknesses. Artificial agents should integrate both of these aspects of AI in order to show general intelligence and solve complex problems in real-world scenarios; similarly to how humans use both the analytical left side and the intuitive right side of their brain in their lives. However, one of the main obstacles hindering this integration is the Symbol Grounding Problem [144], which is the capacity to map physical world observations to a set of symbols. In this thesis, we combine symbolic reasoning and deep learning in order to better represent and reason with abstract knowledge. In particular, we focus on solving non-symbolic-state Reinforcement Learning environments using a symbolic logical domain. We consider different configurations: (i) unknown knowledge of both the symbol grounding function and the symbolic logical domain, (ii) unknown knowledge of the symbol grounding function and prior knowledge of the domain, (iii) imperfect knowledge of the symbols grounding function and unknown knowledge of the domain. We develop algorithms and neural network architectures that are general enough to be applied to different kinds of environments, which we test on both continuous-state control problems and image-based environments. Specifically, we develop two kinds of architectures: one for Markovian RL tasks and one for non-Markovian RL domains. The first is based on model-based RL and representation learning, and is inspired by the substantial prior work in state abstraction for RL [115]. The second is mainly based on recurrent neural networks and continuous relaxations of temporal logic domains. In particular, the first approach extracts a symbolic STRIPS-like abstraction for control problems. For the second approach, we explore connections between recurrent neural networks and finite state machines, and we define Visual Reward Machines, an extension to non-symbolic domains of Reward Machines [27], which are a popular approach to non-Markovian RL tasks
- …