Search CORE

13 research outputs found

BSDAR: Beam Search Decoding with Attention Reward in Neural Keyphrase Generation

Author: Menkovski Vlado
Ni'mah Iftitahu
Pechenizkiy Mykola
Publication venue
Publication date: 17/09/2019
Field of study

This study mainly investigates two decoding problems in neural keyphrase generation: sequence length bias and beam diversity. We introduce an extension of beam search inference based on word-level and n-gram level attention score to adjust and constrain Seq2Seq prediction at test time. Results show that our proposed solution can overcome the algorithm bias to shorter and nearly identical sequences, resulting in a significant improvement of the decoding performance on generating keyphrases that are present and absent in source text

arXiv.org e-Print Archive

VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores

Author: Chen Xinyue
Lin Zhiqiu
Pathak Deepak
Ramanan Deva
Zhang Pengchuan
Publication venue
Publication date: 02/06/2023
Field of study

Vision-language models (VLMs) discriminatively pre-trained with contrastive image-text matching losses such as

P(\text{match}|\text{text}, \text{image})

have been criticized for lacking compositional understanding. This means they might output similar scores even if the original caption is rearranged into a different semantic statement. To address this, we propose to use the

{\bf V}

isual

{\bf G}

enerative

{\bf P}

re-

{\bf T}

raining Score (

{\bf VisualGPTScore}

) of

P(\text{text}|\text{image})

, a

\textit{multimodal generative}

score that captures the likelihood of a text caption conditioned on an image using an image-conditioned language model. Contrary to the belief that VLMs are mere bag-of-words models, our off-the-shelf VisualGPTScore demonstrates top-tier performance on recently proposed image-text retrieval benchmarks like ARO and Crepe that assess compositional reasoning. Furthermore, we factorize VisualGPTScore into a product of the

\textit{marginal}

P(text) and the

\textit{Pointwise Mutual Information}

(PMI). This helps to (a) diagnose datasets with strong language bias, and (b) debias results on other benchmarks like Winoground using an information-theoretic framework. VisualGPTScore provides valuable insights and serves as a strong baseline for future evaluation of visio-linguistic compositionality.Comment: Website: https://linzhiqiu.github.io/papers/visual_gpt_score/ Code: https://github.com/linzhiqiu/visual_gpt_score

arXiv.org e-Print Archive