Measuring the Contributions of Vision and Text Modalities

Parcalabescu, Letitia

Search results>Research output from Journal for Language Technology and Computational Linguistics (JLCL)

research article

oai:jlcl.org:article/261

Measuring the Contributions of Vision and Text Modalities

Authors: Letitia Parcalabescu
Publication date: 27 February 2025
Publisher: German Society for Computational Linguistics and Language Technology (GSCL)
Doi

Abstract

This dissertation investigates multimodal transformers that process both image and text modalities together to generate outputs for various tasks (such as answering questions about images). Specifically, methods are developed to assess the effectiveness of vision and language models in combining, understanding, utilizing, and explaining information from these two modalities. The dissertation contributes to the advancement of the field in three ways: (i) by measuring specific and task-independent capabilities of vision and language models, (ii) by interpreting these models to quantify the extent to which they use and integrate information from both modalities, and (iii) by evaluating their ability to provide self-consistent explanations of their outputs to users

Similar works

Full text

Open in the Core reader

Download PDF

Journal for Language Technology and Computational Linguistics (JLCL)

oai:jlcl.org:article/261

Last time updated on 22/03/2025

This paper was published in Journal for Language Technology and Computational Linguistics (JLCL).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: https://creativecommons.org/licenses/by-sa/4.0