Search CORE

88,266 research outputs found

Grounding semantics in robots for Visual Question Answering

Author: Wahle Björn
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

User centred evaluation of an automatically constructed hyper-textbook

Author: Crestani F.
Ntioudis S.
Publication venue
Publication date: 01/01/2001
Field of study

As hypertext systems become widely available and their popularity increases, attention has turned to converting existing textual documents into hypertextual form. An important issue in this area is the fully automatic production of hypertext for learning, teaching, training, or self-referencing. Although many studies have addressed the problem of producing hyper-books, either manually or semi-automatically, the actual usability of hyper-books tools is still an area of ongoing research. This article presents an effort to investigate the effectiveness of a hyper-textbook for self-referencing produced in a fully automatic way. The hyper-textbook is produced using the Hyper-TextBook methodology. We developed a taskbased evaluation scheme and performed a comparative usercentred evaluation between a hyper-textbook and a conventional, printed form of the same textbook. The results indicate that the hyper-textbook, in most cases, improves speed, accuracy, and user satisfaction in comparison to the printed form of the textbook

University of Strathclyde Institutional Repository

Visual7W: Grounded Question Answering in Images

Author: Bernstein Michael
Fei-Fei Li
Groth Oliver
Zhu Yuke
Publication venue
Publication date: 09/04/2016
Field of study

We have seen great progress in basic perceptual tasks such as object recognition and detection. However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning. Recently the new task of visual question answering (QA) has been proposed to evaluate a model's capacity for deep image understanding. Previous works have established a loose, global association between QA sentences and images. However, many questions and answers, in practice, relate to local regions in the images. We establish a semantic link between textual descriptions and image regions by object-level grounding. It enables a new type of QA with visual answers, in addition to textual answers used in previous work. We study the visual QA tasks in a grounded setting with a large collection of 7W multiple-choice QA pairs. Furthermore, we evaluate human performance and several baseline models on the QA tasks. Finally, we propose a novel LSTM model with spatial attention to tackle the 7W QA tasks.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref