465,524 research outputs found
Characterization of Functionality in a Dynamic Environment
Identifying the functionality in objects means to be able to associate a purpose with them in a specific environment. The purpose depends on the intention of the agent and on the applicability of the object in a particular task. In our investigation of functionality we focus on functionalities which involve changes of physical relation and properties between objects in the environment. A formal model, based on Discrete Event Dynamic System Theory (DEDS), is introduced to define an interactive task for recovering and describing functionality. To observe and control the recovery process we introduce the notion of piecewise observability of a task by different sensors. This allows the description of a dynamic system in which neither all events nor the time of their occurrence may be predicted in advance. We have developed an experimental system consisting of actuators and both force and position sensors, for carrying out the interactive recovery of functionality. In particular, we demonstrate how this approach can be used by carrying out some experiments investigating the functionality of piercing. Furthermore, we discuss the importance of a multisensory approach for the observation and interpretation of functionality
PackIt: A Virtual Environment for Geometric Planning
The ability to jointly understand the geometry of objects and plan actions
for manipulating them is crucial for intelligent agents. We refer to this
ability as geometric planning. Recently, many interactive environments have
been proposed to evaluate intelligent agents on various skills, however, none
of them cater to the needs of geometric planning. We present PackIt, a virtual
environment to evaluate and potentially learn the ability to do geometric
planning, where an agent needs to take a sequence of actions to pack a set of
objects into a box with limited space. We also construct a set of challenging
packing tasks using an evolutionary algorithm. Further, we study various
baselines for the task that include model-free learning-based and
heuristic-based methods, as well as search-based optimization methods that
assume access to the model of the environment. Code and data are available at
https://github.com/princeton-vl/PackIt.Comment: Accepted to ICML 202
Learning to Reason in Round-based Games: Multi-task Sequence Generation for Purchasing Decision Making in First-person Shooters
Sequential reasoning is a complex human ability, with extensive previous
research focusing on gaming AI in a single continuous game, round-based
decision makings extending to a sequence of games remain less explored.
Counter-Strike: Global Offensive (CS:GO), as a round-based game with abundant
expert demonstrations, provides an excellent environment for multi-player
round-based sequential reasoning. In this work, we propose a Sequence Reasoner
with Round Attribute Encoder and Multi-Task Decoder to interpret the strategies
behind the round-based purchasing decisions. We adopt few-shot learning to
sample multiple rounds in a match, and modified model agnostic meta-learning
algorithm Reptile for the meta-learning loop. We formulate each round as a
multi-task sequence generation problem. Our state representations combine
action encoder, team encoder, player features, round attribute encoder, and
economy encoders to help our agent learn to reason under this specific
multi-player round-based scenario. A complete ablation study and comparison
with the greedy approach certify the effectiveness of our model. Our research
will open doors for interpretable AI for understanding episodic and long-term
purchasing strategies beyond the gaming community.Comment: 16th AAAI Conference on Artificial Intelligence and Interactive
Digital Entertainment (AIIDE-20
The use of interactive computer vision and robot hand controllers for enhancing manufacturing safety
Current available robotic systems provide limited support for CAD-based model-driven visualization, sensing algorithm development and integration, and automated graphical planning systems. This paper describes ongoing work which provides the functionality necessary to apply advanced robotics to automated manufacturing and assembly operations. An interface has been built which incorporates 6-DOF tactile manipulation, displays for three dimensional graphical models, and automated tracking functions which depend on automated machine vision. A set of tools for single and multiple focal plane sensor image processing and understanding has been demonstrated which utilizes object recognition models. The resulting tool will enable sensing and planning from computationally simple graphical objects. A synergistic interplay between human and operator vision is created from programmable feedback received from the controller. This approach can be used as the basis for implementing enhanced safety in automated robotics manufacturing, assembly, repair and inspection tasks in both ground and space applications. Thus, an interactive capability has been developed to match the modeled environment to the real task environment for safe and predictable task execution
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Existing benchmarks for grounding language in interactive environments either
lack real-world linguistic elements, or prove difficult to scale up due to
substantial human involvement in the collection of data or feedback signals. To
bridge this gap, we develop WebShop -- a simulated e-commerce website
environment with million real-world products and crowd-sourced
text instructions. Given a text instruction specifying a product requirement,
an agent needs to navigate multiple types of webpages and issue diverse actions
to find, customize, and purchase an item. WebShop provides several challenges
for language grounding including understanding compositional instructions,
query (re-)formulation, comprehending and acting on noisy text in webpages, and
performing strategic exploration. We collect over human demonstrations
for the task, and train and evaluate a diverse range of agents using
reinforcement learning, imitation learning, and pre-trained image and language
models. Our best model achieves a task success rate of , which
outperforms rule-based heuristics () but is far lower than human expert
performance (). We also analyze agent and human trajectories and ablate
various model components to provide insights for developing future agents with
stronger language understanding and decision making abilities. Finally, we show
that agents trained on WebShop exhibit non-trivial sim-to-real transfer when
evaluated on amazon.com and ebay.com, indicating the potential value of WebShop
in developing practical web-based agents that can operate in the wild.Comment: Project page with code, data, demos: https://webshop-pnlp.github.io.
v2 adds transfer to eBa
- …