13 research outputs found
BSDAR: Beam Search Decoding with Attention Reward in Neural Keyphrase Generation
This study mainly investigates two decoding problems in neural keyphrase
generation: sequence length bias and beam diversity. We introduce an extension
of beam search inference based on word-level and n-gram level attention score
to adjust and constrain Seq2Seq prediction at test time. Results show that our
proposed solution can overcome the algorithm bias to shorter and nearly
identical sequences, resulting in a significant improvement of the decoding
performance on generating keyphrases that are present and absent in source
text
VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores
Vision-language models (VLMs) discriminatively pre-trained with contrastive
image-text matching losses such as
have been criticized for lacking compositional understanding. This means they
might output similar scores even if the original caption is rearranged into a
different semantic statement. To address this, we propose to use the isual enerative re-raining Score () of , a score that captures the likelihood of a text caption conditioned
on an image using an image-conditioned language model. Contrary to the belief
that VLMs are mere bag-of-words models, our off-the-shelf VisualGPTScore
demonstrates top-tier performance on recently proposed image-text retrieval
benchmarks like ARO and Crepe that assess compositional reasoning. Furthermore,
we factorize VisualGPTScore into a product of the P(text)
and the (PMI). This helps to (a)
diagnose datasets with strong language bias, and (b) debias results on other
benchmarks like Winoground using an information-theoretic framework.
VisualGPTScore provides valuable insights and serves as a strong baseline for
future evaluation of visio-linguistic compositionality.Comment: Website: https://linzhiqiu.github.io/papers/visual_gpt_score/ Code:
https://github.com/linzhiqiu/visual_gpt_score