52,658 research outputs found

    Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

    Full text link
    Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image. Current state-of-the-art systems attempted to solve the task using deep neural architectures and achieved promising performance. However, the resulting systems are generally opaque and they struggle in understanding questions for which extra knowledge is required. In this paper, we present an explicit reasoning layer on top of a set of penultimate neural network based systems. The reasoning layer enables reasoning and answering questions where additional knowledge is required, and at the same time provides an interpretable interface to the end users. Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL) based engine to reason over a basket of inputs: visual relations, the semantic parse of the question, and background ontological knowledge from word2vec and ConceptNet. Experimental analysis of the answers and the key evidential predicates generated on the VQA dataset validate our approach.Comment: 9 pages, 3 figures, AAAI 201

    On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

    Full text link
    Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on ``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the common training answer is 'no'. Second, the OOD test set is used for model selection. Third, a model's in-domain performance is assessed after retraining it on in-domain splits (VQA v2) that exhibit a more balanced distribution of labels. These three practices defeat the objective of evaluating generalization, and put into question the value of methods specifically designed for this dataset. We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types. We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation

    Processing of false belief passages during natural story comprehension: An fMRI study

    Get PDF
    The neural correlates of theory of mind (ToM) are typically studied using paradigms which require participants to draw explicit, task-related inferences (e.g., in the false belief task). In a natural setup, such as listening to stories, false belief mentalizing occurs incidentally as part of narrative processing. In our experiment, participants listened to auditorily presented stories with false belief passages (implicit false belief processing) and immediately after each story answered comprehension questions (explicit false belief processing), while neural responses were measured with functional magnetic resonance imaging (fMRI). All stories included (among other situations) one false belief condition and one closely matched control condition. For the implicit ToM processing, we modeled the hemodynamic response during the false belief passages in the story and compared it to the hemodynamic response during the closely matched control passages. For implicit mentalizing, we found activation in typical ToM processing regions, that is the angular gyrus (AG), superior medial frontal gyrus (SmFG), precuneus (PCUN), middle temporal gyrus (MTG) as well as in the inferior frontal gyrus (IFG) billaterally. For explicit ToM, we only found AG activation. The conjunction analysis highlighted the left AG and MTG as well as the bilateral IFG as overlapping ToM processing regions for both implicit and explicit modes. Implicit ToM processing during listening to false belief passages, recruits the left SmFG and billateral PCUN in addition to the “mentalizing network” known form explicit processing tasks
    corecore