75 research outputs found

    Can You Explain That? Lucid Explanations Help Human-AI Collaborative Image Retrieval

    Full text link
    While there have been many proposals on making AI algorithms explainable, few have attempted to evaluate the impact of AI-generated explanations on human performance in conducting human-AI collaborative tasks. To bridge the gap, we propose a Twenty-Questions style collaborative image retrieval game, Explanation-assisted Guess Which (ExAG), as a method of evaluating the efficacy of explanations (visual evidence or textual justification) in the context of Visual Question Answering (VQA). In our proposed ExAG, a human user needs to guess a secret image picked by the VQA agent by asking natural language questions to it. We show that overall, when AI explains its answers, users succeed more often in guessing the secret image correctly. Notably, a few correct explanations can readily improve human performance when VQA answers are mostly incorrect as compared to no-explanation games. Furthermore, we also show that while explanations rated as "helpful" significantly improve human performance, "incorrect" and "unhelpful" explanations can degrade performance as compared to no-explanation games. Our experiments, therefore, demonstrate that ExAG is an effective means to evaluate the efficacy of AI-generated explanations on a human-AI collaborative task.Comment: 2019 AAAI Conference on Human Computation and Crowdsourcin

    The Impact of Explanations on AI Competency Prediction in VQA

    Full text link
    Explainability is one of the key elements for building trust in AI systems. Among numerous attempts to make AI explainable, quantifying the effect of explanations remains a challenge in conducting human-AI collaborative tasks. Aside from the ability to predict the overall behavior of AI, in many applications, users need to understand an AI agent's competency in different aspects of the task domain. In this paper, we evaluate the impact of explanations on the user's mental model of AI agent competency within the task of visual question answering (VQA). We quantify users' understanding of competency, based on the correlation between the actual system performance and user rankings. We introduce an explainable VQA system that uses spatial and object features and is powered by the BERT language model. Each group of users sees only one kind of explanation to rank the competencies of the VQA model. The proposed model is evaluated through between-subject experiments to probe explanations' impact on the user's perception of competency. The comparison between two VQA models shows BERT based explanations and the use of object features improve the user's prediction of the model's competencies.Comment: Submitted to HCCAI 202

    Advancing Multi-Modal Deep Learning: Towards Language-Grounded Visual Understanding

    Get PDF
    Using deep learning, computer vision now rivals people at object recognition and detection, opening doors to tackle new challenges in image understanding. Among these challenges, understanding and reasoning about language grounded visual content is of fundamental importance to advancing artificial intelligence. Recently, multiple datasets and algorithms have been created as proxy tasks towards this goal, with visual question answering (VQA) being the most widely studied. In VQA, an algorithm needs to produce an answer to a natural language question about an image. However, our survey of datasets and algorithms for VQA uncovered several sources of dataset bias and sub-optimal evaluation metrics that allowed algorithms to perform well by merely exploiting superficial statistical patterns. In this dissertation, we describe new algorithms and datasets that address these issues. We developed two new datasets and evaluation metrics that enable a more accurate measurement of abilities of a VQA model, and also expand VQA to include new abilities, such as reading text, handling out-of-vocabulary words, and understanding data-visualization. We also created new algorithms for VQA that have helped advance the state-of-the-art for VQA, including an algorithm that surpasses humans on two different chart question answering datasets about bar-charts, line-graphs and pie charts. Finally, we provide a holistic overview of several yet-unsolved challenges in not only VQA but vision and language research at large. Despite enormous progress, we find that a robust understanding and integration of vision and language is still an elusive goal, and much of the progress may be misleading due to dataset bias, superficial correlations and flaws in standard evaluation metrics. We carefully study and categorize these issues for several vision and language tasks and outline several possible paths towards development of safe, robust and trustworthy AI for language-grounded visual understanding
    • …
    corecore