23 research outputs found

    Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering

    Full text link
    Visual question answering (VQA) has the potential to make the Internet more accessible in an interactive way, allowing people who cannot see images to ask questions about them. However, multiple studies have shown that people who are blind or have low-vision prefer image explanations that incorporate the context in which an image appears, yet current VQA datasets focus on images in isolation. We argue that VQA models will not fully succeed at meeting people's needs unless they take context into account. To further motivate and analyze the distinction between different contexts, we introduce Context-VQA, a VQA dataset that pairs images with contexts, specifically types of websites (e.g., a shopping website). We find that the types of questions vary systematically across contexts. For example, images presented in a travel context garner 2 times more "Where?" questions, and images on social media and news garner 2.8 and 1.8 times more "Who?" questions than the average. We also find that context effects are especially important when participants can't see the image. These results demonstrate that context affects the types of questions asked and that VQA models should be context-sensitive to better meet people's needs, especially in accessibility settings.Comment: Proceedings of ICCV 2023 Workshop on Closing the Loop Between Vision and Languag

    Concadia: Towards Image-Based Text Generation with a Purpose

    Full text link
    Current deep learning models often achieve excellent results on benchmark image-to-text datasets but fail to generate texts that are useful in practice. We argue that to close this gap, it is vital to distinguish descriptions from captions based on their distinct communicative roles. Descriptions focus on visual features and are meant to replace an image (often to increase accessibility), whereas captions appear alongside an image to supply additional information. To motivate this distinction and help people put it into practice, we introduce the publicly available Wikipedia-based dataset Concadia consisting of 96,918 images with corresponding English-language descriptions, captions, and surrounding context. Using insights from Concadia, models trained on it, and a preregistered human-subjects experiment with human- and model-generated texts, we characterize the commonalities and differences between descriptions and captions. In addition, we show that, for generating both descriptions and captions, it is useful to augment image-to-text models with representations of the textual context in which the image appeared.Comment: Proceedings of EMNLP 202

    Measurement of the charge asymmetry in top-quark pair production in the lepton-plus-jets final state in pp collision data at s=8 TeV\sqrt{s}=8\,\mathrm TeV{} with the ATLAS detector

    Get PDF

    ATLAS Run 1 searches for direct pair production of third-generation squarks at the Large Hadron Collider

    Get PDF

    Measurement of inclusive jet charged-particle fragmentation functions in Pb plus Pb collisions at root S-NN=2.76 TeV with the ATLAS detector

    Get PDF
    © 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/license/by/3.0/). Funded by SCOAP3

    A Semantics for Causing, Enabling, and Preventing Verbs Using Structural Causal Models

    No full text
    When choosing how to describe what happened, we have a number of causal verbs at our disposal. In this paper, we develop a model-theoretic formal semantics for nine causal verbs that span the categories of CAUSE, ENABLE, and PREVENT. We use structural causal models (SCMs) to represent participants’ mental construction of a scene when assessing the correctness of causal expressions relative to a presented context. Furthermore, SCMs enable us to model events relating both the physical world as well as agents’ mental states. In experimental evaluations, we find that the proposed semantics exhibits a closer alignment with human evaluations in comparison to prior accounts of the verb families
    corecore