52,658 research outputs found
Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering
Many vision and language tasks require commonsense reasoning beyond
data-driven image and natural language processing. Here we adopt Visual
Question Answering (VQA) as an example task, where a system is expected to
answer a question in natural language about an image. Current state-of-the-art
systems attempted to solve the task using deep neural architectures and
achieved promising performance. However, the resulting systems are generally
opaque and they struggle in understanding questions for which extra knowledge
is required. In this paper, we present an explicit reasoning layer on top of a
set of penultimate neural network based systems. The reasoning layer enables
reasoning and answering questions where additional knowledge is required, and
at the same time provides an interpretable interface to the end users.
Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL) based
engine to reason over a basket of inputs: visual relations, the semantic parse
of the question, and background ontological knowledge from word2vec and
ConceptNet. Experimental analysis of the answers and the key evidential
predicates generated on the VQA dataset validate our approach.Comment: 9 pages, 3 figures, AAAI 201
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Out-of-distribution (OOD) testing is increasingly popular for evaluating a
machine learning system's ability to generalize beyond the biases of a training
set. OOD benchmarks are designed to present a different joint distribution of
data and labels between training and test time. VQA-CP has become the standard
OOD benchmark for visual question answering, but we discovered three troubling
practices in its current use. First, most published methods rely on explicit
knowledge of the construction of the OOD splits. They often rely on
``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the
common training answer is 'no'. Second, the OOD test set is used for model
selection. Third, a model's in-domain performance is assessed after retraining
it on in-domain splits (VQA v2) that exhibit a more balanced distribution of
labels. These three practices defeat the objective of evaluating
generalization, and put into question the value of methods specifically
designed for this dataset. We show that embarrassingly-simple methods,
including one that generates answers at random, surpass the state of the art on
some question types. We provide short- and long-term solutions to avoid these
pitfalls and realize the benefits of OOD evaluation
Processing of false belief passages during natural story comprehension: An fMRI study
The neural correlates of theory of mind (ToM) are typically studied using paradigms which require participants to draw explicit, task-related inferences (e.g., in the false belief task). In a natural setup, such as listening to stories, false belief mentalizing occurs incidentally as part of narrative processing. In our experiment, participants listened to auditorily presented stories with false belief passages (implicit false belief processing) and immediately after each story answered comprehension questions (explicit false belief processing), while neural responses were measured with functional magnetic resonance imaging (fMRI). All stories included (among other situations) one false belief condition and one closely matched control condition. For the implicit ToM processing, we modeled the hemodynamic response during the false belief passages in the story and compared it to the hemodynamic response during the closely matched control passages. For implicit mentalizing, we found activation in typical ToM processing regions, that is the angular gyrus (AG), superior medial frontal gyrus (SmFG), precuneus (PCUN), middle temporal gyrus (MTG) as well as in the inferior frontal gyrus (IFG) billaterally. For explicit ToM, we only found AG activation. The conjunction analysis highlighted the left AG and MTG as well as the bilateral IFG as overlapping ToM processing regions for both implicit and explicit modes. Implicit ToM processing during listening to false belief passages, recruits the left SmFG and billateral PCUN in addition to the “mentalizing network” known form explicit processing tasks
- …