465,524 research outputs found

    Characterization of Functionality in a Dynamic Environment

    Get PDF
    Identifying the functionality in objects means to be able to associate a purpose with them in a specific environment. The purpose depends on the intention of the agent and on the applicability of the object in a particular task. In our investigation of functionality we focus on functionalities which involve changes of physical relation and properties between objects in the environment. A formal model, based on Discrete Event Dynamic System Theory (DEDS), is introduced to define an interactive task for recovering and describing functionality. To observe and control the recovery process we introduce the notion of piecewise observability of a task by different sensors. This allows the description of a dynamic system in which neither all events nor the time of their occurrence may be predicted in advance. We have developed an experimental system consisting of actuators and both force and position sensors, for carrying out the interactive recovery of functionality. In particular, we demonstrate how this approach can be used by carrying out some experiments investigating the functionality of piercing. Furthermore, we discuss the importance of a multisensory approach for the observation and interpretation of functionality

    PackIt: A Virtual Environment for Geometric Planning

    Full text link
    The ability to jointly understand the geometry of objects and plan actions for manipulating them is crucial for intelligent agents. We refer to this ability as geometric planning. Recently, many interactive environments have been proposed to evaluate intelligent agents on various skills, however, none of them cater to the needs of geometric planning. We present PackIt, a virtual environment to evaluate and potentially learn the ability to do geometric planning, where an agent needs to take a sequence of actions to pack a set of objects into a box with limited space. We also construct a set of challenging packing tasks using an evolutionary algorithm. Further, we study various baselines for the task that include model-free learning-based and heuristic-based methods, as well as search-based optimization methods that assume access to the model of the environment. Code and data are available at https://github.com/princeton-vl/PackIt.Comment: Accepted to ICML 202

    Learning to Reason in Round-based Games: Multi-task Sequence Generation for Purchasing Decision Making in First-person Shooters

    Full text link
    Sequential reasoning is a complex human ability, with extensive previous research focusing on gaming AI in a single continuous game, round-based decision makings extending to a sequence of games remain less explored. Counter-Strike: Global Offensive (CS:GO), as a round-based game with abundant expert demonstrations, provides an excellent environment for multi-player round-based sequential reasoning. In this work, we propose a Sequence Reasoner with Round Attribute Encoder and Multi-Task Decoder to interpret the strategies behind the round-based purchasing decisions. We adopt few-shot learning to sample multiple rounds in a match, and modified model agnostic meta-learning algorithm Reptile for the meta-learning loop. We formulate each round as a multi-task sequence generation problem. Our state representations combine action encoder, team encoder, player features, round attribute encoder, and economy encoders to help our agent learn to reason under this specific multi-player round-based scenario. A complete ablation study and comparison with the greedy approach certify the effectiveness of our model. Our research will open doors for interpretable AI for understanding episodic and long-term purchasing strategies beyond the gaming community.Comment: 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-20

    The use of interactive computer vision and robot hand controllers for enhancing manufacturing safety

    Get PDF
    Current available robotic systems provide limited support for CAD-based model-driven visualization, sensing algorithm development and integration, and automated graphical planning systems. This paper describes ongoing work which provides the functionality necessary to apply advanced robotics to automated manufacturing and assembly operations. An interface has been built which incorporates 6-DOF tactile manipulation, displays for three dimensional graphical models, and automated tracking functions which depend on automated machine vision. A set of tools for single and multiple focal plane sensor image processing and understanding has been demonstrated which utilizes object recognition models. The resulting tool will enable sensing and planning from computationally simple graphical objects. A synergistic interplay between human and operator vision is created from programmable feedback received from the controller. This approach can be used as the basis for implementing enhanced safety in automated robotics manufacturing, assembly, repair and inspection tasks in both ground and space applications. Thus, an interactive capability has been developed to match the modeled environment to the real task environment for safe and predictable task execution

    WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

    Full text link
    Existing benchmarks for grounding language in interactive environments either lack real-world linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals. To bridge this gap, we develop WebShop -- a simulated e-commerce website environment with 1.181.18 million real-world products and 12,08712,087 crowd-sourced text instructions. Given a text instruction specifying a product requirement, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase an item. WebShop provides several challenges for language grounding including understanding compositional instructions, query (re-)formulation, comprehending and acting on noisy text in webpages, and performing strategic exploration. We collect over 1,6001,600 human demonstrations for the task, and train and evaluate a diverse range of agents using reinforcement learning, imitation learning, and pre-trained image and language models. Our best model achieves a task success rate of 29%29\%, which outperforms rule-based heuristics (9.6%9.6\%) but is far lower than human expert performance (59%59\%). We also analyze agent and human trajectories and ablate various model components to provide insights for developing future agents with stronger language understanding and decision making abilities. Finally, we show that agents trained on WebShop exhibit non-trivial sim-to-real transfer when evaluated on amazon.com and ebay.com, indicating the potential value of WebShop in developing practical web-based agents that can operate in the wild.Comment: Project page with code, data, demos: https://webshop-pnlp.github.io. v2 adds transfer to eBa
    corecore