3,116 research outputs found
Designing constraints: composing and performing with digital musical systems
This paper investigates two central terms in Human Computer Interaction (HCI) – affordances and constraints – and studies their relevance to the design and understanding of digital musical systems. It argues that in the analysis of complex systems, such as new interfaces for musical expression (NIME), constraints are a more productive analytical tool than the common HCI usage of affordances. Constraints are seen as limitations enabling the musician to encapsulate a specific search space of both physical and compositional gestures, proscribing complexity in favor of a relatively simple set of rules that engender creativity. By exploring the design of three different digital musical systems, the paper defines constraints as a core attribute of mapping, whether in instruments or compositional systems. The paper describes the aspiration for designing constraints as twofold: to save time, as musical performance is typically a real-time process, and to minimize the performer’s cognitive load. Finally, it discusses skill and virtuosity in the realm of new musical interfaces for musical expression with regard to constraints
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
We study the problem of building a controller that can follow open-ended
instructions in open-world environments. We propose to follow reference videos
as instructions, which offer expressive goal specifications while eliminating
the need for expensive text-gameplay annotations. A new learning framework is
derived to allow learning such instruction-following controllers from gameplay
videos while producing a video instruction encoder that induces a structured
goal space. We implement our agent GROOT in a simple yet effective
encoder-decoder architecture based on causal transformers. We evaluate GROOT
against open-world counterparts and human players on a proposed Minecraft
SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the
human-machine gap as well as exhibiting a 70% winning rate over the best
generalist agent baseline. Qualitative analysis of the induced goal space
further demonstrates some interesting emergent properties, including the goal
composition and complex gameplay behavior synthesis. The project page is
available at https://craftjarvis-groot.github.io
Qualitative Data Mining and Its Applications
In machine learning from numerical data, usually the target concept is a numerical function that facilitates quantitative prediction. In contrast to this, we consider qualitative data mining which aims at finding qualitative patterns, or qualitative relationships in numerical data. We present one approach to qualitative data mining, in which the target concepts are expressed as qualitative decision trees. We reviewsome case studies in qualitative data mining, and discuss typical application scenarios that involve the learning of qualitative trees
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
Achieving human-like planning and control with multimodal observations in an
open world is a key milestone for more functional generalist agents. Existing
approaches can handle certain long-horizon tasks in an open world. However,
they still struggle when the number of open-world tasks could potentially be
infinite and lack the capability to progressively enhance task completion as
game time progresses. We introduce JARVIS-1, an open-world agent that can
perceive multimodal input (visual observations and human instructions),
generate sophisticated plans, and perform embodied control, all within the
popular yet challenging open-world Minecraft universe. Specifically, we develop
JARVIS-1 on top of pre-trained multimodal language models, which map visual
observations and textual instructions to plans. The plans will be ultimately
dispatched to the goal-conditioned controllers. We outfit JARVIS-1 with a
multimodal memory, which facilitates planning using both pre-trained knowledge
and its actual game survival experiences. JARVIS-1 is the existing most general
agent in Minecraft, capable of completing over 200 different tasks using
control and observation space similar to humans. These tasks range from
short-horizon tasks, e.g., "chopping trees" to long-horizon tasks, e.g.,
"obtaining a diamond pickaxe". JARVIS-1 performs exceptionally well in
short-horizon tasks, achieving nearly perfect performance. In the classic
long-term task of , JARVIS-1 surpasses the
reliability of current state-of-the-art agents by 5 times and can successfully
complete longer-horizon and more challenging tasks. The project page is
available at https://craftjarvis.org/JARVIS-1Comment: update project pag
Voyager: An Open-Ended Embodied Agent with Large Language Models
We introduce Voyager, the first LLM-powered embodied lifelong learning agent
in Minecraft that continuously explores the world, acquires diverse skills, and
makes novel discoveries without human intervention. Voyager consists of three
key components: 1) an automatic curriculum that maximizes exploration, 2) an
ever-growing skill library of executable code for storing and retrieving
complex behaviors, and 3) a new iterative prompting mechanism that incorporates
environment feedback, execution errors, and self-verification for program
improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses
the need for model parameter fine-tuning. The skills developed by Voyager are
temporally extended, interpretable, and compositional, which compounds the
agent's abilities rapidly and alleviates catastrophic forgetting. Empirically,
Voyager shows strong in-context lifelong learning capability and exhibits
exceptional proficiency in playing Minecraft. It obtains 3.3x more unique
items, travels 2.3x longer distances, and unlocks key tech tree milestones up
to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill
library in a new Minecraft world to solve novel tasks from scratch, while other
techniques struggle to generalize. We open-source our full codebase and prompts
at https://voyager.minedojo.org/.Comment: Project website and open-source codebase:
https://voyager.minedojo.org
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
We investigate the challenge of task planning for multi-task embodied agents
in open-world environments. Two main difficulties are identified: 1) executing
plans in an open-world environment (e.g., Minecraft) necessitates accurate and
multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla
planners do not consider how easy the current agent can achieve a given
sub-task when ordering parallel sub-goals within a complicated plan, the
resulting plan could be inefficient or even infeasible. To this end, we propose
"escribe, xplain, lan and
elect" (), an interactive planning approach based
on Large Language Models (LLMs). DEPS facilitates better error correction on
initial LLM-generated by integrating of
the plan execution process and providing self- of
feedback when encountering failures during the extended planning phases.
Furthermore, it includes a goal , which is a trainable
module that ranks parallel candidate sub-goals based on the estimated steps of
completion, consequently refining the initial plan. Our experiments mark the
milestone of the first zero-shot multi-task agent that can robustly accomplish
70+ Minecraft tasks and nearly double the overall performances. Further testing
reveals our method's general effectiveness in popularly adopted non-open-ended
domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and
exploratory studies detail how our design beats the counterparts and provide a
promising update on the grand challenge with our
approach. The code is released at https://github.com/CraftJarvis/MC-Planner.Comment: NeurIPS 202
Recommended from our members
Computationally-Mediated Interactions with Traditional Textile Crafts
Crafting is a fundamental part of humanity; it gives us purpose and satisfaction, and it produces items that are beautiful and functional. At the same time, our relationship to traditional physical crafts has seen a massive upheaval due to technological advancements. The freedom and support to create textile crafts for pleasure and passion rather than necessity has never been more widespread. Interdisciplinary explorations of computer science and textile crafts have been increasing due to more people being intrinsically motivated to craft, more forms of supportive technology constantly being built, and more interconnected communities growing via the internet. The work in this dissertation begins to coalesce these user-focused and craft-based experiences by presenting crafting as a series of interconnected and nested processes covering the design, manufacture, and experience of textile craft products. Within these processes, I examine how crafters experience physical and digital forms of agency, both high and low, as well as ludic engagement or the lack thereof, as a means of evaluating user perception of technology related to textiles. As exemplar artifacts, I present my own, as well as collaborative, research on design support software, a manufacturing machine appropriated as a game console, and a pair of electronic textile-based fidget toys. This research illustrates how many varied design choices affected audiences' agency and ludic engagement by examining crafters' perceptions and skill levels, as well as the craft domain interpretations by accompanying software and hardware. The interdisciplinary work presented in this dissertation crosses the boundaries of creativity support software, computational creativity, game studies, human-computer interaction, and crafting communities outside the realm of academia. More importantly, this research begins to explicitly join predominantly feminine craft and masculine technological communities across all ages for the enrichment of all those involved
Turn-by-wire: Computationally mediated physical fabrication
Advances in digital fabrication have simultaneously created new capabilities while reinforcing outdated workflows that constrain how, and by whom, these fabrication tools are used. In this paper, we investigate how a new class of hybrid-controlled machines can collaborate with novice and expert users alike to yield a more lucid making experience. We demonstrate these ideas through our system, Turn-by-Wire. By combining the capabilities of a traditional lathe with haptic input controllers that modulate both position and force, we detail a series of novel interaction metaphors that invite a more fluid making process spanning digital, model-centric, computer control, and embodied, adaptive, human control. We evaluate our system through a user study and discuss how these concepts generalize to other fabrication tools
ADaPT: As-Needed Decomposition and Planning with Language Models
Large Language Models (LLMs) are increasingly being used for interactive
decision-making tasks requiring planning and adapting to the environment.
Recent works employ LLMs-as-agents in broadly two ways: iteratively determining
the next action (iterative executors) or generating plans and executing
sub-tasks using LLMs (plan-and-execute). However, these methods struggle with
task complexity, as the inability to execute any sub-task may lead to task
failure. To address these shortcomings, we introduce As-Needed Decomposition
and Planning for complex Tasks (ADaPT), an approach that explicitly plans and
decomposes complex sub-tasks as-needed, i.e., when the LLM is unable to execute
them. ADaPT recursively decomposes sub-tasks to adapt to both task complexity
and LLM capability. Our results demonstrate that ADaPT substantially
outperforms established strong baselines, achieving success rates up to 28.3%
higher in ALFWorld, 27% in WebShop, and 33% in TextCraft -- a novel
compositional dataset that we introduce. Through extensive analysis, we
illustrate the importance of multilevel decomposition and establish that ADaPT
dynamically adjusts to the capabilities of the executor LLM as well as to task
complexity.Comment: Project Page: https://allenai.github.io/adaptll
- …