54 research outputs found
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
We introduce AMAGO, an in-context Reinforcement Learning (RL) agent that uses
sequence models to tackle the challenges of generalization, long-term memory,
and meta-learning. Recent works have shown that off-policy learning can make
in-context RL with recurrent policies viable. Nonetheless, these approaches
require extensive tuning and limit scalability by creating key bottlenecks in
agents' memory capacity, planning horizon, and model size. AMAGO revisits and
redesigns the off-policy in-context approach to successfully train
long-sequence Transformers over entire rollouts in parallel with end-to-end RL.
Our agent is scalable and applicable to a wide range of problems, and we
demonstrate its strong performance empirically in meta-RL and long-term memory
domains. AMAGO's focus on sparse rewards and off-policy data also allows
in-context learning to extend to goal-conditioned problems with challenging
exploration. When combined with a multi-goal hindsight relabeling scheme, AMAGO
can solve a previously difficult category of open-world domains, where agents
complete many possible instructions in procedurally generated environments.Comment: ICLR 202
Prismer: A Vision-Language Model with An Ensemble of Experts
Recent vision-language models have shown impressive multi-modal generation
capabilities. However, typically they require training huge models on massive
datasets. As a more scalable alternative, we introduce Prismer, a data- and
parameter-efficient vision-language model that leverages an ensemble of domain
experts. Prismer only requires training of a small number of components, with
the majority of network weights inherited from readily-available, pre-trained
domain experts, and kept frozen during training. By leveraging experts from a
wide range of domains, we show that Prismer can efficiently pool this expert
knowledge and adapt it to various vision-language reasoning tasks. In our
experiments, we show that Prismer achieves fine-tuned and few-shot learning
performance which is competitive with current state-of-the-art models, whilst
requiring up to two orders of magnitude less training data. Code is available
at https://github.com/NVlabs/prismer.Comment: Tech Report. Project Page: https://shikun.io/projects/prismer Code:
https://github.com/NVlabs/prismer v2: fixed incorrect training cost estimate
and zero-shot NoCaps performance of SimVL
A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition
We study large-scale kernel methods for acoustic modeling and compare to DNNs
on performance metrics related to both acoustic modeling and recognition.
Measuring perplexity and frame-level classification accuracy, kernel-based
acoustic models are as effective as their DNN counterparts. However, on
token-error-rates DNN models can be significantly better. We have discovered
that this might be attributed to DNN's unique strength in reducing both the
perplexity and the entropy of the predicted posterior probabilities. Motivated
by our findings, we propose a new technique, entropy regularized perplexity,
for model selection. This technique can noticeably improve the recognition
performance of both types of models, and reduces the gap between them. While
effective on Broadcast News, this technique could be also applicable to other
tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400
Performance evaluation of Internet medical service based on network big data--Take Good Doctor Online as an example
In order to develop new economic and social forms in a wide range, China has been actively promoting the ‘Internet +’ action to encourage the deep integration of the Internet and various fields. However, the performance of the Internet platform built in various fields has always been a difficulty in assessment and evaluation. Taking the ‘Internet + Medical’ Good Doctor Online platform as an example, this paper explores how the ‘Internet +’ platform conducts data collection, data cleaning, data conversion, data regulations and data visualization, thus exploring a general approach for the performance evaluation of the ‘Internet +’ platform. Based on this performance evaluation approach, this paper thinks that the performance of Good Doctor Online platform is good, but in order to promote the comprehensive development of Internet medical service, there are still some possibilities of improvement, such as standardization of subject classification, reasonableness of doctors resources distribution in different areas, sustainability of health insurance settlement and satisfaction enhancement, etc
Voyager: An Open-Ended Embodied Agent with Large Language Models
We introduce Voyager, the first LLM-powered embodied lifelong learning agent
in Minecraft that continuously explores the world, acquires diverse skills, and
makes novel discoveries without human intervention. Voyager consists of three
key components: 1) an automatic curriculum that maximizes exploration, 2) an
ever-growing skill library of executable code for storing and retrieving
complex behaviors, and 3) a new iterative prompting mechanism that incorporates
environment feedback, execution errors, and self-verification for program
improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses
the need for model parameter fine-tuning. The skills developed by Voyager are
temporally extended, interpretable, and compositional, which compounds the
agent's abilities rapidly and alleviates catastrophic forgetting. Empirically,
Voyager shows strong in-context lifelong learning capability and exhibits
exceptional proficiency in playing Minecraft. It obtains 3.3x more unique
items, travels 2.3x longer distances, and unlocks key tech tree milestones up
to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill
library in a new Minecraft world to solve novel tasks from scratch, while other
techniques struggle to generalize. We open-source our full codebase and prompts
at https://voyager.minedojo.org/.Comment: Project website and open-source codebase:
https://voyager.minedojo.org
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations
Imitation learning from a large set of human demonstrations has proved to be
an effective paradigm for building capable robot agents. However, the
demonstrations can be extremely costly and time-consuming to collect. We
introduce MimicGen, a system for automatically synthesizing large-scale, rich
datasets from only a small number of human demonstrations by adapting them to
new contexts. We use MimicGen to generate over 50K demonstrations across 18
tasks with diverse scene configurations, object instances, and robot arms from
just ~200 human demonstrations. We show that robot agents can be effectively
trained on this generated dataset by imitation learning to achieve strong
performance in long-horizon and high-precision tasks, such as multi-part
assembly and coffee preparation, across broad initial state distributions. We
further demonstrate that the effectiveness and utility of MimicGen data compare
favorably to collecting additional human demonstrations, making it a powerful
and economical approach towards scaling up robot learning. Datasets, simulation
environments, videos, and more at https://mimicgen.github.io .Comment: Conference on Robot Learning (CoRL) 202
ARDuP: Active Region Video Diffusion for Universal Policies
Sequential decision-making can be formulated as a text-conditioned video
generation problem, where a video planner, guided by a text-defined goal,
generates future frames visualizing planned actions, from which control actions
are subsequently derived. In this work, we introduce Active Region Video
Diffusion for Universal Policies (ARDuP), a novel framework for video-based
policy learning that emphasizes the generation of active regions, i.e.
potential interaction areas, enhancing the conditional policy's focus on
interactive areas critical for task execution. This innovative framework
integrates active region conditioning with latent diffusion models for video
planning and employs latent representations for direct action decoding during
inverse dynamic modeling. By utilizing motion cues in videos for automatic
active region discovery, our method eliminates the need for manual annotations
of active regions. We validate ARDuP's efficacy via extensive experiments on
simulator CLIPort and the real-world dataset BridgeData v2, achieving notable
improvements in success rates and generating convincingly realistic video
plans
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
Imitation learning from human demonstrations is a promising paradigm for
teaching robots manipulation skills in the real world. However, learning
complex long-horizon tasks often requires an unattainable amount of
demonstrations. To reduce the high data requirement, we resort to human play
data - video sequences of people freely interacting with the environment using
their hands. Even with different morphologies, we hypothesize that human play
data contain rich and salient information about physical interactions that can
readily facilitate robot policy learning. Motivated by this, we introduce a
hierarchical learning framework named MimicPlay that learns latent plans from
human play data to guide low-level visuomotor control trained on a small number
of teleoperated demonstrations. With systematic evaluations of 14 long-horizon
manipulation tasks in the real world, we show that MimicPlay outperforms
state-of-the-art imitation learning methods in task success rate,
generalization ability, and robustness to disturbances. Code and videos are
available at https://mimic-play.github.ioComment: 7th Conference on Robot Learning (CoRL 2023 oral presentation
Eureka: Human-Level Reward Design via Coding Large Language Models
Large Language Models (LLMs) have excelled as high-level semantic planners
for sequential decision-making tasks. However, harnessing them to learn complex
low-level manipulation tasks, such as dexterous pen spinning, remains an open
problem. We bridge this fundamental gap and present Eureka, a human-level
reward design algorithm powered by LLMs. Eureka exploits the remarkable
zero-shot generation, code-writing, and in-context improvement capabilities of
state-of-the-art LLMs, such as GPT-4, to perform evolutionary optimization over
reward code. The resulting rewards can then be used to acquire complex skills
via reinforcement learning. Without any task-specific prompting or pre-defined
reward templates, Eureka generates reward functions that outperform expert
human-engineered rewards. In a diverse suite of 29 open-source RL environments
that include 10 distinct robot morphologies, Eureka outperforms human experts
on 83% of the tasks, leading to an average normalized improvement of 52%. The
generality of Eureka also enables a new gradient-free in-context learning
approach to reinforcement learning from human feedback (RLHF), readily
incorporating human inputs to improve the quality and the safety of the
generated rewards without model updating. Finally, using Eureka rewards in a
curriculum learning setting, we demonstrate for the first time, a simulated
Shadow Hand capable of performing pen spinning tricks, adeptly manipulating a
pen in circles at rapid speed.Comment: Project website and open-source code:
https://eureka-research.github.io
- …