22,805 research outputs found
Embodied Question Answering
We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where
an agent is spawned at a random location in a 3D environment and asked a
question ("What color is the car?"). In order to answer, the agent must first
intelligently navigate to explore the environment, gather information through
first-person (egocentric) vision, and then answer the question ("orange").
This challenging task requires a range of AI skills -- active perception,
language understanding, goal-driven navigation, commonsense reasoning, and
grounding of language into actions. In this work, we develop the environments,
end-to-end-trained reinforcement learning agents, and evaluation protocols for
EmbodiedQA.Comment: 20 pages, 13 figures, Webpage: https://embodiedqa.org
Robotic Assistance in Coordination of Patient Care
We conducted a study to investigate trust in and
dependence upon robotic decision support among nurses and
doctors on a labor and delivery floor. There is evidence that
suggestions provided by embodied agents engender inappropriate
degrees of trust and reliance among humans. This concern is a
critical barrier that must be addressed before fielding intelligent
hospital service robots that take initiative to coordinate patient
care. Our experiment was conducted with nurses and physicians,
and evaluated the subjects’ levels of trust in and dependence
on high- and low-quality recommendations issued by robotic
versus computer-based decision support. The support, generated
through action-driven learning from expert demonstration, was
shown to produce high-quality recommendations that were ac-
cepted by nurses and physicians at a compliance rate of 90%.
Rates of Type I and Type II errors were comparable between
robotic and computer-based decision support. Furthermore, em-
bodiment appeared to benefit performance, as indicated by a
higher degree of appropriate dependence after the quality of
recommendations changed over the course of the experiment.
These results support the notion that a robotic assistant may
be able to safely and effectively assist in patient care. Finally,
we conducted a pilot demonstration in which a robot assisted
resource nurses on a labor and delivery floor at a tertiary care
center.National Science Foundation (U.S.) (Grant 2388357
Agent AI: Surveying the Horizons of Multimodal Interaction
Multi-modal AI systems will likely become a ubiquitous presence in our
everyday lives. A promising approach to making these systems more interactive
is to embody them as agents within physical and virtual environments. At
present, systems leverage existing foundation models as the basic building
blocks for the creation of embodied agents. Embedding agents within such
environments facilitates the ability of models to process and interpret visual
and contextual data, which is critical for the creation of more sophisticated
and context-aware AI systems. For example, a system that can perceive user
actions, human behavior, environmental objects, audio expressions, and the
collective sentiment of a scene can be used to inform and direct agent
responses within the given environment. To accelerate research on agent-based
multimodal intelligence, we define "Agent AI" as a class of interactive systems
that can perceive visual stimuli, language inputs, and other
environmentally-grounded data, and can produce meaningful embodied actions. In
particular, we explore systems that aim to improve agents based on
next-embodied action prediction by incorporating external knowledge,
multi-sensory inputs, and human feedback. We argue that by developing agentic
AI systems in grounded environments, one can also mitigate the hallucinations
of large foundation models and their tendency to generate environmentally
incorrect outputs. The emerging field of Agent AI subsumes the broader embodied
and agentic aspects of multimodal interactions. Beyond agents acting and
interacting in the physical world, we envision a future where people can easily
create any virtual reality or simulated scene and interact with agents embodied
within the virtual environment
Boosting Reinforcement Learning and Planning with Demonstrations: A Survey
Although reinforcement learning has seen tremendous success recently, this
kind of trial-and-error learning can be impractical or inefficient in complex
environments. The use of demonstrations, on the other hand, enables agents to
benefit from expert knowledge rather than having to discover the best action to
take through exploration. In this survey, we discuss the advantages of using
demonstrations in sequential decision making, various ways to apply
demonstrations in learning-based decision making paradigms (for example,
reinforcement learning and planning in the learned models), and how to collect
the demonstrations in various scenarios. Additionally, we exemplify a practical
pipeline for generating and utilizing demonstrations in the recently proposed
ManiSkill robot learning benchmark
PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav
We study ObjectGoal Navigation -- where a virtual robot situated in a new
environment is asked to navigate to an object. Prior work has shown that
imitation learning (IL) using behavior cloning (BC) on a dataset of human
demonstrations achieves promising results. However, this has limitations -- 1)
BC policies generalize poorly to new states, since the training mimics actions
not their consequences, and 2) collecting demonstrations is expensive. On the
other hand, reinforcement learning (RL) is trivially scalable, but requires
careful reward engineering to achieve desirable behavior. We present PIRLNav, a
two-stage learning scheme for BC pretraining on human demonstrations followed
by RL-finetuning. This leads to a policy that achieves a success rate of
on ObjectNav ( absolute over previous state-of-the-art). Using
this BCRL training recipe, we present a rigorous empirical
analysis of design choices. First, we investigate whether human demonstrations
can be replaced with `free' (automatically generated) sources of
demonstrations, e.g. shortest paths (SP) or task-agnostic frontier exploration
(FE) trajectories. We find that BCRL on human demonstrations
outperforms BCRL on SP and FE trajectories, even when controlled
for same BC-pretraining success on train, and even on a subset of val episodes
where BC-pretraining success favors the SP or FE policies. Next, we study how
RL-finetuning performance scales with the size of the BC pretraining dataset.
We find that as we increase the size of BC-pretraining dataset and get to high
BC accuracies, improvements from RL-finetuning are smaller, and that of
the performance of our best BCRL policy can be achieved with less
than half the number of BC demonstrations. Finally, we analyze failure modes of
our ObjectNav policies, and present guidelines for further improving them.Comment: 8 pages + supplemen
Language-conditioned Learning for Robotic Manipulation: A Survey
Language-conditioned robotic manipulation represents a cutting-edge area of
research, enabling seamless communication and cooperation between humans and
robotic agents. This field focuses on teaching robotic systems to comprehend
and execute instructions conveyed in natural language. To achieve this, the
development of robust language understanding models capable of extracting
actionable insights from textual input is essential. In this comprehensive
survey, we systematically explore recent advancements in language-conditioned
approaches within the context of robotic manipulation. We analyze these
approaches based on their learning paradigms, which encompass reinforcement
learning, imitation learning, and the integration of foundational models, such
as large language models and vision-language models. Furthermore, we conduct an
in-depth comparative analysis, considering aspects like semantic information
extraction, environment & evaluation, auxiliary tasks, and task representation.
Finally, we outline potential future research directions in the realm of
language-conditioned learning for robotic manipulation, with the topic of
generalization capabilities and safety issues. The GitHub repository of this
paper can be found at
https://github.com/hk-zh/language-conditioned-robot-manipulation-model
- …