Search CORE

17 research outputs found

A Survey on Multimodal Large Language Models

Author: Chen Enhong
Fu Chaoyou
Li Ke
Sun Xing
Xu Tong
Yin Shukang
Zhao Sirui
Publication venue
Publication date: 23/06/2023
Field of study

Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.Comment: Project page:https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model

arXiv.org e-Print Archive

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Author: Chen Enhong
Fu Chaoyou
Li Ke
Shen Yunhang
Sui Dianbo
Sun Xing
Wang Hao
Xu Tong
Yin Shukang
Zhao Sirui
Publication venue
Publication date: 24/10/2023
Field of study

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like a woodpecker heals trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.Comment: 16 pages, 7 figures. Code Website: https://github.com/BradyFU/Woodpecke

arXiv.org e-Print Archive

Recommended from our members

Local Tournament Incentives and Firm Risk

Author: Yin Sirui
Yin Sirui
Publication venue: The University of Arizona.
Publication date: 01/01/2018
Field of study

Using the compensation gap between a CEO and the highest-paid CEO in the same Metropolitan Statistical Area (MSA) as a proxy for local tournament incentives, I document a positive relation between local tournament incentives and firm risk. Specifically, CEOs who face higher local incentives implement riskier policies, including higher R&D expenditures and less diversification. Exploiting quasi-shocks to local incentives and cross-sectional variation in the probability of winning, I show that the incentive effects vary systematically with theoretical predictions. The results are robust to alternative local tournament incentives measures, sample periods, and firm risk proxies

The University of Arizona

Shareholder Litigation Rights and the Cost of Debt: Evidence from Derivative Lawsuits

Author: Sirui Yin
Xiaoran Ni
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Crossref

Theoretical and Experimental Investigations of Tunable Microwave Signal Generation Based on a 1-GHz All-Polarization-Maintaining Mode-Locked Fiber Laser

Author: Denghui Song
Ke Yin
Sirui Kong
Xiya Chen
Zhongjie Xu
Publication venue: MDPI AG
Publication date: 01/10/2022
Field of study

Photonics-based microwave generation brings the advantages of photonic oscillators, such as high stability, wide bandwidth, and low loss, to the microwave domain. In this paper, the generation of tunable microwave signals was investigated both theoretically and experimentally based on an all-polarization-maintaining 1-GHz mode-locked fiber laser. Based on beating between two highly chirped optical pulse trains with a relative time delay at the photodetector, tunable microwave signals could be obtained. The numerical simulations show that 40 GHz or higher microwave signals could be obtained by tuning the time delay and dispersion. To experimentally validate the theoretical model, the generation of tunable microwave signals from 2–4 GHz was demonstrated. Due to the utilization of polarization-maintaining devices, the optical output has a high degree of linear polarization of more than 99%, which verifies the enhanced system stability. These demonstrations are imperative for solidifying the advancements of recent years and could promote the utilization of photonics-based microwave generation in microwave photonics

Directory of Open Access Journals

Theoretical and Experimental Investigations of Tunable Microwave Signal Generation Based on a 1-GHz All-Polarization-Maintaining Mode-Locked Fiber Laser

Author: Denghui Song
Ke Yin
Sirui Kong
Xiya Chen
Zhongjie Xu
Publication venue: 'MDPI AG'
Publication date: 01/10/2022
Field of study

Multidisciplinary Digital Publishing Institute