1,410 research outputs found
Grounding Language for Transfer in Deep Reinforcement Learning
In this paper, we explore the utilization of natural language to drive
transfer for reinforcement learning (RL). Despite the wide-spread application
of deep RL techniques, learning generalized policy representations that work
across domains remains a challenging problem. We demonstrate that textual
descriptions of environments provide a compact intermediate channel to
facilitate effective policy transfer. Specifically, by learning to ground the
meaning of text to the dynamics of the environment such as transitions and
rewards, an autonomous agent can effectively bootstrap policy learning on a new
domain given its description. We employ a model-based RL approach consisting of
a differentiable planning module, a model-free component and a factorized state
representation to effectively use entity descriptions. Our model outperforms
prior work on both transfer and multi-task scenarios in a variety of different
environments. For instance, we achieve up to 14% and 11.5% absolute improvement
over previously existing models in terms of average and initial rewards,
respectively.Comment: JAIR 201
Comparative evaluation of approaches in T.4.1-4.3 and working definition of adaptive module
The goal of this deliverable is two-fold: (1) to present and compare different approaches towards learning and encoding movements us- ing dynamical systems that have been developed by the AMARSi partners (in the past during the first 6 months of the project), and (2) to analyze their suitability to be used as adaptive modules, i.e. as building blocks for the complete architecture that will be devel- oped in the project. The document presents a total of eight approaches, in two groups: modules for discrete movements (i.e. with a clear goal where the movement stops) and for rhythmic movements (i.e. which exhibit periodicity). The basic formulation of each approach is presented together with some illustrative simulation results. Key character- istics such as the type of dynamical behavior, learning algorithm, generalization properties, stability analysis are then discussed for each approach. We then make a comparative analysis of the different approaches by comparing these characteristics and discussing their suitability for the AMARSi project
Voyager: An Open-Ended Embodied Agent with Large Language Models
We introduce Voyager, the first LLM-powered embodied lifelong learning agent
in Minecraft that continuously explores the world, acquires diverse skills, and
makes novel discoveries without human intervention. Voyager consists of three
key components: 1) an automatic curriculum that maximizes exploration, 2) an
ever-growing skill library of executable code for storing and retrieving
complex behaviors, and 3) a new iterative prompting mechanism that incorporates
environment feedback, execution errors, and self-verification for program
improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses
the need for model parameter fine-tuning. The skills developed by Voyager are
temporally extended, interpretable, and compositional, which compounds the
agent's abilities rapidly and alleviates catastrophic forgetting. Empirically,
Voyager shows strong in-context lifelong learning capability and exhibits
exceptional proficiency in playing Minecraft. It obtains 3.3x more unique
items, travels 2.3x longer distances, and unlocks key tech tree milestones up
to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill
library in a new Minecraft world to solve novel tasks from scratch, while other
techniques struggle to generalize. We open-source our full codebase and prompts
at https://voyager.minedojo.org/.Comment: Project website and open-source codebase:
https://voyager.minedojo.org
Large Language Models for Robotics: A Survey
The human ability to learn, generalize, and control complex manipulation
tasks through multi-modality feedback suggests a unique capability, which we
refer to as dexterity intelligence. Understanding and assessing this
intelligence is a complex task. Amidst the swift progress and extensive
proliferation of large language models (LLMs), their applications in the field
of robotics have garnered increasing attention. LLMs possess the ability to
process and generate natural language, facilitating efficient interaction and
collaboration with robots. Researchers and engineers in the field of robotics
have recognized the immense potential of LLMs in enhancing robot intelligence,
human-robot interaction, and autonomy. Therefore, this comprehensive review
aims to summarize the applications of LLMs in robotics, delving into their
impact and contributions to key areas such as robot control, perception,
decision-making, and path planning. We first provide an overview of the
background and development of LLMs for robotics, followed by a description of
the benefits of LLMs for robotics and recent advancements in robotics models
based on LLMs. We then delve into the various techniques used in the model,
including those employed in perception, decision-making, control, and
interaction. Finally, we explore the applications of LLMs in robotics and some
potential challenges they may face in the near future. Embodied intelligence is
the future of intelligent science, and LLMs-based robotics is one of the
promising but challenging paths to achieve this.Comment: Preprint. 4 figures, 3 table
Using Multi-Relational Embeddings as Knowledge Graph Representations for Robotics Applications
User demonstrations of robot tasks in everyday environments, such as households, can be brittle due in part to the dynamic, diverse, and complex properties of those environments. Humans can find solutions in ambiguous or unfamiliar situations by using a wealth of common-sense knowledge about their domains to make informed generalizations. For example, likely locations for food in a novel household. Prior work has shown that robots can benefit from reasoning about this type of semantic knowledge, which can be modeled as a knowledge graph of interrelated facts that define whether a relationship exists between two entities. Semantic reasoning about domain knowledge using knowledge graph representations has improved the robustness and usability of end user robots by enabling more fault tolerant task execution. Knowledge graph representations define the underlying representation of facts, how facts are organized, and implement semantic reasoning by defining the possible computations over facts (e.g. association, fact-prediction).
This thesis examines the use of multi-relational embeddings as knowledge graph representations within the context of robust task execution and develops methods to explain the inferences of and sequentially train multi-relational embeddings. This thesis contributes: (i) a survey of knowledge graph representations that model semantic domain knowledge in robotics, (ii) the development and evaluation of our knowledge graph representation based on multi-relational embeddings, (iii) the integration of our knowledge graph representation into a robot architecture to improve robust task execution, (iv) the development and evaluation of methods to sequentially update multi-relational embeddings, and (v) the development and evaluation of an inference reconciliation framework for multi-relational embeddings.Ph.D
Recommended from our members
On Building Generalizable Learning Agents
It has been a long-standing goal in Artificial Intelligence (AI) to build machines that can solve tasks that humans can. Thanks to the recent rapid progress in data-driven methods, which train agents to solve tasks by learning from massive training data, there have been many successes in applying such learning approaches to handle and even solve a number of extremely challenging tasks, including image classification, language generation, robotics control, and several multi-player games. The key factor for all these data-driven successes is that the trained agents can generalize to test scenarios that are unseen during training. This generalization capability is the foundation for building any practical AI system. This thesis studies generalization, the fundamental challenge in AI, and proposes solutions to improve the generalization performances of learning agents in a variety of problems. We start by providing a formal formulation of the generalization problem in the context of reinforcement learning and proposing 4 principles within this formulation to guide the design of training techniques for improved generalization. We validate the effectiveness of our proposed principles by considering 4 different domains, from simple to complex, and developing domain-specific techniques following these principles. Particularly, we begin with the simplest domain, i.e., path-finding on graphs (Part I), and then consider visual navigation in a 3D world (Part II) and competition in complex multi-agent games (Part III), and lastly tackle some natural language processing tasks (Part IV). Empirical evidences demonstrate that the proposed principles can generally lead to much improved generalization performances in a wide range of problems
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Large language models (LLMs) are a special class of pretrained language
models obtained by scaling model size, pretraining corpus and computation.
LLMs, because of their large size and pretraining on large volumes of text
data, exhibit special abilities which allow them to achieve remarkable
performances without any task-specific training in many of the natural language
processing tasks. The era of LLMs started with OpenAI GPT-3 model, and the
popularity of LLMs is increasing exponentially after the introduction of models
like ChatGPT and GPT4. We refer to GPT-3 and its successor OpenAI models,
including ChatGPT and GPT4, as GPT-3 family large language models (GLLMs). With
the ever-rising popularity of GLLMs, especially in the research community,
there is a strong need for a comprehensive survey which summarizes the recent
research progress in multiple dimensions and can guide the research community
with insightful future research directions. We start the survey paper with
foundation concepts like transformers, transfer learning, self-supervised
learning, pretrained language models and large language models. We then present
a brief overview of GLLMs and discuss the performances of GLLMs in various
downstream tasks, specific domains and multiple languages. We also discuss the
data labelling and data augmentation abilities of GLLMs, the robustness of
GLLMs, the effectiveness of GLLMs as evaluators, and finally, conclude with
multiple insightful future research directions. To summarize, this
comprehensive survey paper will serve as a good resource for both academic and
industry people to stay updated with the latest research related to GPT-3
family large language models.Comment: Preprint under review, 58 page
- …