5,373 research outputs found
Robot Navigation in Unseen Spaces using an Abstract Map
Human navigation in built environments depends on symbolic spatial
information which has unrealised potential to enhance robot navigation
capabilities. Information sources such as labels, signs, maps, planners, spoken
directions, and navigational gestures communicate a wealth of spatial
information to the navigators of built environments; a wealth of information
that robots typically ignore. We present a robot navigation system that uses
the same symbolic spatial information employed by humans to purposefully
navigate in unseen built environments with a level of performance comparable to
humans. The navigation system uses a novel data structure called the abstract
map to imagine malleable spatial models for unseen spaces from spatial symbols.
Sensorimotor perceptions from a robot are then employed to provide purposeful
navigation to symbolic goal locations in the unseen environment. We show how a
dynamic system can be used to create malleable spatial models for the abstract
map, and provide an open source implementation to encourage future work in the
area of symbolic navigation. Symbolic navigation performance of humans and a
robot is evaluated in a real-world built environment. The paper concludes with
a qualitative analysis of human navigation strategies, providing further
insights into how the symbolic navigation capabilities of robots in unseen
built environments can be improved in the future.Comment: 15 pages, published in IEEE Transactions on Cognitive and
Developmental Systems (http://doi.org/10.1109/TCDS.2020.2993855), see
https://btalb.github.io/abstract_map/ for access to softwar
Core Challenges in Embodied Vision-Language Planning
Recent advances in the areas of multimodal machine learning and artificial
intelligence (AI) have led to the development of challenging tasks at the
intersection of Computer Vision, Natural Language Processing, and Embodied AI.
Whereas many approaches and previous survey pursuits have characterised one or
two of these dimensions, there has not been a holistic analysis at the center
of all three. Moreover, even when combinations of these topics are considered,
more focus is placed on describing, e.g., current architectural methods, as
opposed to also illustrating high-level challenges and opportunities for the
field. In this survey paper, we discuss Embodied Vision-Language Planning
(EVLP) tasks, a family of prominent embodied navigation and manipulation
problems that jointly use computer vision and natural language. We propose a
taxonomy to unify these tasks and provide an in-depth analysis and comparison
of the new and current algorithmic approaches, metrics, simulated environments,
as well as the datasets used for EVLP tasks. Finally, we present the core
challenges that we believe new EVLP works should seek to address, and we
advocate for task construction that enables model generalizability and furthers
real-world deployment.Comment: 35 page
The Waggle Dance as an Intended Flight: A Cognitive Perspective
The notion of the waggle dance simulating a flight towards a goal in a walking pattern has been proposed in the context of evolutionary considerations. Behavioral components, like its arousing effect on the social community, the attention of hive mates induced by this behavior, the direction of the waggle run relative to the sun azimuth or to gravity, as well as the number of waggles per run, have been tentatively related to peculiar behavioral patterns in both solitary and social insect species and are thought to reflect phylogenetic pre-adaptations. Here, I ask whether these thoughts can be substantiated from a functional perspective. Communication in the waggle dance is a group phenomenon involving the dancer and the followers that perform partially overlapping movements encoding and decoding the message respectively. It is thus assumed that the dancer and follower perform close cognitive processes. This provides us with access to these cognitive processes during dance communication because the follower can be tested in its flight performance when it becomes a recruit. I argue that the dance message and the landscape experience are processed in the same navigational memory, allowing the bee to fly novel direct routes, a property understood as an indication of a cognitive map
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Trained with an unprecedented scale of data, large language models (LLMs)
like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities
from model scaling. Such a trend underscored the potential of training LLMs
with unlimited language data, advancing the development of a universal embodied
agent. In this work, we introduce the NavGPT, a purely LLM-based
instruction-following navigation agent, to reveal the reasoning capability of
GPT models in complex embodied scenes by performing zero-shot sequential action
prediction for vision-and-language navigation (VLN). At each step, NavGPT takes
the textual descriptions of visual observations, navigation history, and future
explorable directions as inputs to reason the agent's current status, and makes
the decision to approach the target. Through comprehensive experiments, we
demonstrate NavGPT can explicitly perform high-level planning for navigation,
including decomposing instruction into sub-goal, integrating commonsense
knowledge relevant to navigation task resolution, identifying landmarks from
observed scenes, tracking navigation progress, and adapting to exceptions with
plan adjustment. Furthermore, we show that LLMs is capable of generating
high-quality navigational instructions from observations and actions along a
path, as well as drawing accurate top-down metric trajectory given the agent's
navigation history. Despite the performance of using NavGPT to zero-shot R2R
tasks still falling short of trained models, we suggest adapting multi-modality
inputs for LLMs to use as visual navigation agents and applying the explicit
reasoning of LLMs to benefit learning-based models
From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN)
Visual Indoor Navigation (VIN) task has drawn increasing attention from the
data-driven machine learning communities especially with the recently reported
success from learning-based methods. Due to the innate complexity of this task,
researchers have tried approaching the problem from a variety of different
angles, the full scope of which has not yet been captured within an overarching
report. This survey first summarizes the representative work of learning-based
approaches for the VIN task and then identifies and discusses lingering issues
impeding the VIN performance, as well as motivates future research in these key
areas worth exploring for the community
- …