14 research outputs found
Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks
We study building a multi-task agent in Minecraft. Without human
demonstrations, solving long-horizon tasks in this open-ended environment with
reinforcement learning (RL) is extremely sample inefficient. To tackle the
challenge, we decompose solving Minecraft tasks into learning basic skills and
planning over the skills. We propose three types of fine-grained basic skills
in Minecraft, and use RL with intrinsic rewards to accomplish basic skills with
high success rates. For skill planning, we use Large Language Models to find
the relationships between skills and build a skill graph in advance. When the
agent is solving a task, our skill search algorithm walks on the skill graph
and generates the proper skill plans for the agent. In experiments, our method
accomplishes 24 diverse Minecraft tasks, where many tasks require sequentially
executing for more than 10 skills. Our method outperforms baselines in most
tasks by a large margin. The project's website and code can be found at
https://sites.google.com/view/plan4mc.Comment: 19 page
Hierarchical Reinforcement Learning in Minecraft
ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In
addition to this, humans are also able to learn new skills by applying relevant knowledge,
observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning
(RL) algorithms rely on a lengthy trial-and-error training process, making it infeasible
to train them in the real world. In this thesis, to address sparse, hierarchical problems
we propose the following: (1) an RL algorithm, Branched Rainbow from Demonstrations
(BRfD), which combines several improvements to the Deep Q-Networks (DQN) algorithm,
and is capable of learning from human demonstrations; (2) a hierarchically structured RL
algorithm using BRfD to solve a set of sub-tasks in order to reach a goal. We evaluate both
of these algorithms in the 2019 MineRL challenge environments. The MineRL competition
challenged participants to find a Diamond i n M inecraftâa 3 D, o p en-world, procedurally
generated game. We analyse the efficiency of several improvements implemented in the
BRfD algorithm through an extensive ablation study. For this study, the agents are tasked
with collecting 64 logs in a Minecraft forest environment. We show that our algorithm
outperforms the overall winner of the MineRL challenge in the TreeChop environment.
Additionally, we show that nearly all of the improvements impact the performance either in
terms of learning speed or rewards received. For the hierarchical algorithm, we segment the
demonstrations into the respective sub-tasks. The algorithm then trains a version of BRfD
on these demonstrations before learning from its own experiences in the environment. We
then evaluate the algorithm by inspecting the proportion of episodes in which certain items
were obtained. While our algorithm is able to obtain iron ore, the current state-of-the-art
algorithms are capable of obtaining a diamond.AFRIKAANSE OPSOMMING: Mense het die uitsonderlike vermoë om op verskillende vlakke van abstraksie verskeie
take uit te voer. Verder kan nuwe vaardighede aangeleer word deur relevante kennis toe
te pas, kundiges waar te neem en deur verfyning van ondervinding. Verskeie bestaande
versterkingsleer-algoritmes vertrou op omslagtige probeer-en-tref opleidingsprosesse wat
dit nie lewensvatbaar maak in die praktyk nie. In hierdie tesis, om die beperkte rangorde
van belangrikheid aan te spreek, stel ons die volgende voor: (1) ân versterkingsleer-
algoritme, âBranched Rainbow from Demonstrations (BRfD)â, wat verskeie verbeterings
in die âDeep Q-Networks (DQN)â algoritme kombineer wat deur menslike demonstrasie
leer; (2) ân hiĂ«rargiesgestruktureerde versterkingsleer-algoritme wat deur middel van BRfD
verskeie subtake kan oplos. Ons ontleed beide die bovermelde algoritmes in die 2019
âMineRLâ omgewing. Die âMineRLâ kompetisie het deelnemers uitgedaag om ân Diamant
te vind in âMinecraftâ. âMinecraftâ is ân driedimensionele, âopen-worldâ, progressief
gegenereerde rekenaarspeletjie. Verskeie verbeteringe wat in die BRfD-algoritme toegepas
is deur omvangryke ablasiestudiemetodes word ontleed. Vir die studie is die agente
opdrag gegee om 64 âlogsâ in ân âMinecraftâ woud omgewing bymekaar te maak. Ons
toon dat hierdie algoritme die algehele wenner in die âTreechopâ omgewing van die 2019
âMineRLâ uitdaging klop.
erder toon ons dat byna alle verbeterings ân positiewe impak
het ten opsigte van leerspoed of vergoeding ontvang. Vir die hiërargiese algoritme is die
demonstrasies opgebreek in hulle verskeie subopdragte. Die algoritme leer dan ân weergawe
van BRfD deur middel van hierdie demonstrasies gebaseer op sy eie ondervinding in die
omgewing. Ons evalueer dan die algoritmes deur ân ondersoek te doen na die proporsie van
episodes waar sekere items verkry is. Ons algoritme kon slegs ystererts vind in teenstelling
met die huidige moderne algoritmes wat ân diamant vind.Master
Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory
The captivating realm of Minecraft has attracted substantial research
interest in recent years, serving as a rich platform for developing intelligent
agents capable of functioning in open-world environments. However, the current
research landscape predominantly focuses on specific objectives, such as the
popular "ObtainDiamond" task, and has not yet shown effective generalization to
a broader spectrum of tasks. Furthermore, the current leading success rate for
the "ObtainDiamond" task stands at around 20%, highlighting the limitations of
Reinforcement Learning (RL) based controllers used in existing methods. To
tackle these challenges, we introduce Ghost in the Minecraft (GITM), a novel
framework integrates Large Language Models (LLMs) with text-based knowledge and
memory, aiming to create Generally Capable Agents (GCAs) in Minecraft. These
agents, equipped with the logic and common sense capabilities of LLMs, can
skillfully navigate complex, sparse-reward environments with text-based
interactions. We develop a set of structured actions and leverage LLMs to
generate action plans for the agents to execute. The resulting LLM-based agent
markedly surpasses previous methods, achieving a remarkable improvement of
+47.5% in success rate on the "ObtainDiamond" task, demonstrating superior
robustness compared to traditional RL-based controllers. Notably, our agent is
the first to procure all items in the Minecraft Overworld technology tree,
demonstrating its extensive capabilities. GITM does not need any GPU for
training, but a single CPU node with 32 CPU cores is enough. This research
shows the potential of LLMs in developing capable agents for handling
long-horizon, complex tasks and adapting to uncertainties in open-world
environments. See the project website at https://github.com/OpenGVLab/GITM
Adaptive Agent Architecture for Real-time Human-Agent Teaming
Teamwork is a set of interrelated reasoning, actions and behaviors of team
members that facilitate common objectives. Teamwork theory and experiments have
resulted in a set of states and processes for team effectiveness in both
human-human and agent-agent teams. However, human-agent teaming is less well
studied because it is so new and involves asymmetry in policy and intent not
present in human teams. To optimize team performance in human-agent teaming, it
is critical that agents infer human intent and adapt their polices for smooth
coordination. Most literature in human-agent teaming builds agents referencing
a learned human model. Though these agents are guaranteed to perform well with
the learned model, they lay heavy assumptions on human policy such as
optimality and consistency, which is unlikely in many real-world scenarios. In
this paper, we propose a novel adaptive agent architecture in human-model-free
setting on a two-player cooperative game, namely Team Space Fortress (TSF).
Previous human-human team research have shown complementary policies in TSF
game and diversity in human players' skill, which encourages us to relax the
assumptions on human policy. Therefore, we discard learning human models from
human data, and instead use an adaptation strategy on a pre-trained library of
exemplar policies composed of RL algorithms or rule-based methods with minimal
assumptions of human behavior. The adaptation strategy relies on a novel
similarity metric to infer human policy and then selects the most complementary
policy in our library to maximize the team performance. The adaptive agent
architecture can be deployed in real-time and generalize to any off-the-shelf
static agents. We conducted human-agent experiments to evaluate the proposed
adaptive agent framework, and demonstrated the suboptimality, diversity, and
adaptability of human policies in human-agent teams.Comment: The first three authors contributed equally. In AAAI 2021 Workshop on
Plan, Activity, and Intent Recognitio
xxAI - Beyond Explainable AI
This is an open access book. Statistical machine learning (ML) has triggered a renaissance of artificial intelligence (AI). While the most successful ML models, including Deep Neural Networks (DNN), have developed better predictivity, they have become increasingly complex, at the expense of human interpretability (correlation vs. causality). The field of explainable AI (xAI) has emerged with the goal of creating tools and models that are both predictive and interpretable and understandable for humans. Explainable AI is receiving huge interest in the machine learning and AI research communities, across academia, industry, and government, and there is now an excellent opportunity to push towards successful explainable AI applications. This volume will help the research community to accelerate this process, to promote a more systematic use of explainable AI to improve models in diverse applications, and ultimately to better understand how current explainable AI methods need to be improved and what kind of theory of explainable AI is needed. After overviews of current methods and challenges, the editors include chapters that describe new developments in explainable AI. The contributions are from leading researchers in the field, drawn from both academia and industry, and many of the chapters take a clear interdisciplinary approach to problem-solving. The concepts discussed include explainability, causability, and AI interfaces with humans, and the applications include image processing, natural language, law, fairness, and climate science
xxAI - Beyond Explainable AI
This is an open access book.
Statistical machine learning (ML) has triggered a renaissance of artificial intelligence (AI). While the most successful ML models, including Deep Neural Networks (DNN), have developed better predictivity, they have become increasingly complex, at the expense of human interpretability (correlation vs. causality). The field of explainable AI (xAI) has emerged with the goal of creating tools and models that are both predictive and interpretable and understandable for humans.
Explainable AI is receiving huge interest in the machine learning and AI research communities, across academia, industry, and government, and there is now an excellent opportunity to push towards successful explainable AI applications. This volume will help the research community to accelerate this process, to promote a more systematic use of explainable AI to improve models in diverse applications, and ultimately to better understand how current explainable AI methods need to be improved and what kind of theory of explainable AI is needed.
After overviews of current methods and challenges, the editors include chapters that describe new developments in explainable AI. The contributions are from leading researchers in the field, drawn from both academia and industry, and many of the chapters take a clear interdisciplinary approach to problem-solving. The concepts discussed include explainability, causability, and AI interfaces with humans, and the applications include image processing, natural language, law, fairness, and climate science.https://digitalcommons.unomaha.edu/isqafacbooks/1000/thumbnail.jp