3,116 research outputs found

    Designing constraints: composing and performing with digital musical systems

    Get PDF
    This paper investigates two central terms in Human Computer Interaction (HCI) – affordances and constraints – and studies their relevance to the design and understanding of digital musical systems. It argues that in the analysis of complex systems, such as new interfaces for musical expression (NIME), constraints are a more productive analytical tool than the common HCI usage of affordances. Constraints are seen as limitations enabling the musician to encapsulate a specific search space of both physical and compositional gestures, proscribing complexity in favor of a relatively simple set of rules that engender creativity. By exploring the design of three different digital musical systems, the paper defines constraints as a core attribute of mapping, whether in instruments or compositional systems. The paper describes the aspiration for designing constraints as twofold: to save time, as musical performance is typically a real-time process, and to minimize the performer’s cognitive load. Finally, it discusses skill and virtuosity in the realm of new musical interfaces for musical expression with regard to constraints

    GROOT: Learning to Follow Instructions by Watching Gameplay Videos

    Full text link
    We study the problem of building a controller that can follow open-ended instructions in open-world environments. We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations. A new learning framework is derived to allow learning such instruction-following controllers from gameplay videos while producing a video instruction encoder that induces a structured goal space. We implement our agent GROOT in a simple yet effective encoder-decoder architecture based on causal transformers. We evaluate GROOT against open-world counterparts and human players on a proposed Minecraft SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the human-machine gap as well as exhibiting a 70% winning rate over the best generalist agent baseline. Qualitative analysis of the induced goal space further demonstrates some interesting emergent properties, including the goal composition and complex gameplay behavior synthesis. The project page is available at https://craftjarvis-groot.github.io

    Qualitative Data Mining and Its Applications

    Get PDF
    In machine learning from numerical data, usually the target concept is a numerical function that facilitates quantitative prediction. In contrast to this, we consider qualitative data mining which aims at finding qualitative patterns, or qualitative relationships in numerical data. We present one approach to qualitative data mining, in which the target concepts are expressed as qualitative decision trees. We reviewsome case studies in qualitative data mining, and discuss typical application scenarios that involve the learning of qualitative trees

    JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

    Full text link
    Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents. Existing approaches can handle certain long-horizon tasks in an open world. However, they still struggle when the number of open-world tasks could potentially be infinite and lack the capability to progressively enhance task completion as game time progresses. We introduce JARVIS-1, an open-world agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control, all within the popular yet challenging open-world Minecraft universe. Specifically, we develop JARVIS-1 on top of pre-trained multimodal language models, which map visual observations and textual instructions to plans. The plans will be ultimately dispatched to the goal-conditioned controllers. We outfit JARVIS-1 with a multimodal memory, which facilitates planning using both pre-trained knowledge and its actual game survival experiences. JARVIS-1 is the existing most general agent in Minecraft, capable of completing over 200 different tasks using control and observation space similar to humans. These tasks range from short-horizon tasks, e.g., "chopping trees" to long-horizon tasks, e.g., "obtaining a diamond pickaxe". JARVIS-1 performs exceptionally well in short-horizon tasks, achieving nearly perfect performance. In the classic long-term task of ObtainDiamondPickaxe\texttt{ObtainDiamondPickaxe}, JARVIS-1 surpasses the reliability of current state-of-the-art agents by 5 times and can successfully complete longer-horizon and more challenging tasks. The project page is available at https://craftjarvis.org/JARVIS-1Comment: update project pag

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Full text link
    We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.Comment: Project website and open-source codebase: https://voyager.minedojo.org

    Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

    Full text link
    We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose "D‾\underline{D}escribe, E‾\underline{E}xplain, P‾\underline{P}lan and S‾\underline{S}elect" (DEPS\textbf{DEPS}), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated plan\textit{plan} by integrating description\textit{description} of the plan execution process and providing self-explanation\textit{explanation} of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal selector\textit{selector}, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the ObtainDiamond\texttt{ObtainDiamond} grand challenge with our approach. The code is released at https://github.com/CraftJarvis/MC-Planner.Comment: NeurIPS 202

    Turn-by-wire: Computationally mediated physical fabrication

    Get PDF
    Advances in digital fabrication have simultaneously created new capabilities while reinforcing outdated workflows that constrain how, and by whom, these fabrication tools are used. In this paper, we investigate how a new class of hybrid-controlled machines can collaborate with novice and expert users alike to yield a more lucid making experience. We demonstrate these ideas through our system, Turn-by-Wire. By combining the capabilities of a traditional lathe with haptic input controllers that modulate both position and force, we detail a series of novel interaction metaphors that invite a more fluid making process spanning digital, model-centric, computer control, and embodied, adaptive, human control. We evaluate our system through a user study and discuss how these concepts generalize to other fabrication tools

    ADaPT: As-Needed Decomposition and Planning with Language Models

    Full text link
    Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next action (iterative executors) or generating plans and executing sub-tasks using LLMs (plan-and-execute). However, these methods struggle with task complexity, as the inability to execute any sub-task may lead to task failure. To address these shortcomings, we introduce As-Needed Decomposition and Planning for complex Tasks (ADaPT), an approach that explicitly plans and decomposes complex sub-tasks as-needed, i.e., when the LLM is unable to execute them. ADaPT recursively decomposes sub-tasks to adapt to both task complexity and LLM capability. Our results demonstrate that ADaPT substantially outperforms established strong baselines, achieving success rates up to 28.3% higher in ALFWorld, 27% in WebShop, and 33% in TextCraft -- a novel compositional dataset that we introduce. Through extensive analysis, we illustrate the importance of multilevel decomposition and establish that ADaPT dynamically adjusts to the capabilities of the executor LLM as well as to task complexity.Comment: Project Page: https://allenai.github.io/adaptll
    • …
    corecore