355 research outputs found
A thesis presented to the faculty of the Humanities and Teacher Education Division, Pepperdine University
Storyteller as cultural leader -- The answer : pilot Ice cream addiction -- Ten years later -- The last train to Death Valley
Visual7W: Grounded Question Answering in Images
We have seen great progress in basic perceptual tasks such as object
recognition and detection. However, AI models still fail to match humans in
high-level vision tasks due to the lack of capacities for deeper reasoning.
Recently the new task of visual question answering (QA) has been proposed to
evaluate a model's capacity for deep image understanding. Previous works have
established a loose, global association between QA sentences and images.
However, many questions and answers, in practice, relate to local regions in
the images. We establish a semantic link between textual descriptions and image
regions by object-level grounding. It enables a new type of QA with visual
answers, in addition to textual answers used in previous work. We study the
visual QA tasks in a grounded setting with a large collection of 7W
multiple-choice QA pairs. Furthermore, we evaluate human performance and
several baseline models on the QA tasks. Finally, we propose a novel LSTM model
with spatial attention to tackle the 7W QA tasks.Comment: CVPR 201
- …