Search CORE

8,852 research outputs found

Unbiased Directed Object Attention Graph for Object Navigation

Author: Chen Qijun
Dang Ronghao
He Zongtao
Liu Chengju
Shi Zhuofan
Wang Liuyi
Publication venue
Publication date: 09/04/2022
Field of study

Object navigation tasks require agents to locate specific objects in unknown environments based on visual information. Previously, graph convolutions were used to implicitly explore the relationships between objects. However, due to differences in visibility among objects, it is easy to generate biases in object attention. Thus, in this paper, we propose a directed object attention (DOA) graph to guide the agent in explicitly learning the attention relationships between objects, thereby reducing the object attention bias. In particular, we use the DOA graph to perform unbiased adaptive object attention (UAOA) on the object features and unbiased adaptive image attention (UAIA) on the raw images, respectively. To distinguish features in different branches, a concise adaptive branch energy distribution (ABED) method is proposed. We assess our methods on the AI2-Thor dataset. Compared with the state-of-the-art (SOTA) method, our method reports 7.4%, 8.1% and 17.6% increase in success rate (SR), success weighted by path length (SPL) and success weighted by action efficiency (SAE), respectively.Comment: 13 pages, ready to ACM Mutimedia, under revie

arXiv.org e-Print Archive

The Development of LLMs for Embodied Navigation

Author: Feng Xuxiang
Gao Han
Guo Li
Lin Jinzhou
Wang Changwei
Xu Rongtao
Xu Shibiao
Zhang Man
Publication venue
Publication date: 17/11/2023
Field of study

In recent years, the rapid advancement of Large Language Models (LLMs) such as the Generative Pre-trained Transformer (GPT) has attracted increasing attention due to their potential in a variety of practical applications. The application of LLMs with Embodied Intelligence has emerged as a significant area of focus. Among the myriad applications of LLMs, navigation tasks are particularly noteworthy because they demand a deep understanding of the environment and quick, accurate decision-making. LLMs can augment embodied intelligence systems with sophisticated environmental perception and decision-making support, leveraging their robust language and image-processing capabilities. This article offers an exhaustive summary of the symbiosis between LLMs and embodied intelligence with a focus on navigation. It reviews state-of-the-art models, research methodologies, and assesses the advantages and disadvantages of existing embodied navigation models and datasets. Finally, the article elucidates the role of LLMs in embodied intelligence, based on current research, and forecasts future directions in the field. A comprehensive list of studies in this survey is available at https://github.com/Rongtao-Xu/Awesome-LLM-E

arXiv.org e-Print Archive

A Survey of Embodied AI: From Simulators to Research Tasks

Author: Duan Jiafei
Tan Cheston
Tan Hui Li
Yu Samson
Zhu Hongyuan
Publication venue
Publication date: 30/09/2021
Field of study

There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", where AI algorithms and agents no longer learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through interactions with their environments from an egocentric perception similar to humans. Consequently, there has been substantial growth in the demand for embodied AI simulators to support various embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of Artificial General Intelligence (AGI), but there has not been a contemporary and comprehensive survey of this field. This paper aims to provide an encyclopedic survey for the field of embodied AI, from its simulators to its research. By evaluating nine current embodied AI simulators with our proposed seven features, this paper aims to understand the simulators in their provision for use in embodied AI research and their limitations. Lastly, this paper surveys the three main research tasks in embodied AI -- visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation metrics and datasets. Finally, with the new insights revealed through surveying the field, the paper will provide suggestions for simulator-for-task selections and recommendations for the future directions of the field.Comment: Under Review for IEEE TETC

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

mSpace: What do Numbers and Totals Mean in a Flexible Semantic Browser

Author: schraefel m.c.
Wilson Max L.
Publication venue
Publication date: 01/01/2006
Field of study

With the Semantic Web community’s growing interest in Human Computer Interaction, this paper addresses a challenge for user interface design and future shifts in search paradigms. Where browsers using current search paradigms often use numeric values to indicate volumes of sub-hierarchies, future semantic browsers will not be limited to fixed hierarchical datasets, but allow flexible exploration through multiple intersecting domains. With the future use of similar numeric indicators uncertain, research here suggests that the inclusion of such indicators should be based around focal data objects within each information domain. Further research is required, as a significant number of contradicting participant expectations were present. It is the concern of the Semantic Web community to make sure that future btic search paradigms can best support their users

CiteSeerX

Southampton (e-Prints Soton)

End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon

Author: Antsfeld Leonid
Bono Guillaume
Chidlovskii Boris
Weinzaepfel Philippe
Wolf Christian
Publication venue
Publication date: 28/09/2023
Field of study

Most recent work in goal oriented visual navigation resorts to large-scale machine learning in simulated environments. The main challenge lies in learning compact representations generalizable to unseen environments and in learning high-capacity perception modules capable of reasoning on high-dimensional input. The latter is particularly difficult when the goal is not given as a category ("ObjectNav") but as an exemplar image ("ImageNav"), as the perception module needs to learn a comparison strategy requiring to solve an underlying visual correspondence problem. This has been shown to be difficult from reward alone or with standard auxiliary tasks. We address this problem through a sequence of two pretext tasks, which serve as a prior for what we argue is one of the main bottleneck in perception, extremely wide-baseline relative pose estimation and visibility prediction in complex scenes. The first pretext task, cross-view completion is a proxy for the underlying visual correspondence problem, while the second task addresses goal detection and finding directly. We propose a new dual encoder with a large-capacity binocular ViT model and show that correspondence solutions naturally emerge from the training signals. Experiments show significant improvements and SOTA performance on the two benchmarks, ImageNav and the Instance-ImageNav variant, where camera intrinsics and height differ between observation and goal

arXiv.org e-Print Archive