Retrieving Multimodal Information for Augmented Generation: A Survey

Chen, Hailin; Ding, Bosheng; Do, Xuan Long; Guo, Xiaobao; Jiao, Fangkai; Joty, Shafiq; Li, Minzhi; Li, Xingxuan; Qin, Chengwei; Wang, Weishi; Zhao, Ruochen

Retrieving Multimodal Information for Augmented Generation: A Survey

Authors: Hailin Chen
Bosheng Ding
Xuan Long Do
Xiaobao Guo
Fangkai Jiao
Shafiq Joty
Minzhi Li
Xingxuan Li
Chengwei Qin
Weishi Wang
Ruochen Zhao
Publication date: 30 November 2023
Publisher

Abstract

As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multimodal knowledge, whose formats range from images, codes, tables, graphs, to audio. Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. By providing an in-depth review, this survey is expected to provide scholars with a deeper understanding of the methods' applications and encourage them to adapt existing techniques to the fast-growing field of LLMs

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2303.10868

Last time updated on 28/03/2023