88,266 research outputs found
Grounding semantics in robots for Visual Question Answering
In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
User centred evaluation of an automatically constructed hyper-textbook
As hypertext systems become widely available and their popularity increases, attention has turned to converting existing textual documents into hypertextual form. An important issue in this area is the fully automatic production of hypertext for learning, teaching, training, or self-referencing. Although many studies have addressed the problem of producing hyper-books, either manually or semi-automatically, the actual usability of hyper-books tools is still an area of ongoing research. This article presents an effort to investigate the effectiveness of a hyper-textbook for self-referencing produced in a fully automatic way. The hyper-textbook is produced using the Hyper-TextBook methodology. We developed a taskbased evaluation scheme and performed a comparative usercentred evaluation between a hyper-textbook and a conventional, printed form of the same textbook. The results indicate that the hyper-textbook, in most cases, improves speed, accuracy, and user satisfaction in comparison to the printed form of the textbook
Visual7W: Grounded Question Answering in Images
We have seen great progress in basic perceptual tasks such as object
recognition and detection. However, AI models still fail to match humans in
high-level vision tasks due to the lack of capacities for deeper reasoning.
Recently the new task of visual question answering (QA) has been proposed to
evaluate a model's capacity for deep image understanding. Previous works have
established a loose, global association between QA sentences and images.
However, many questions and answers, in practice, relate to local regions in
the images. We establish a semantic link between textual descriptions and image
regions by object-level grounding. It enables a new type of QA with visual
answers, in addition to textual answers used in previous work. We study the
visual QA tasks in a grounded setting with a large collection of 7W
multiple-choice QA pairs. Furthermore, we evaluate human performance and
several baseline models on the QA tasks. Finally, we propose a novel LSTM model
with spatial attention to tackle the 7W QA tasks.Comment: CVPR 201
- …