88,266 research outputs found

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning

    User centred evaluation of an automatically constructed hyper-textbook

    Get PDF
    As hypertext systems become widely available and their popularity increases, attention has turned to converting existing textual documents into hypertextual form. An important issue in this area is the fully automatic production of hypertext for learning, teaching, training, or self-referencing. Although many studies have addressed the problem of producing hyper-books, either manually or semi-automatically, the actual usability of hyper-books tools is still an area of ongoing research. This article presents an effort to investigate the effectiveness of a hyper-textbook for self-referencing produced in a fully automatic way. The hyper-textbook is produced using the Hyper-TextBook methodology. We developed a taskbased evaluation scheme and performed a comparative usercentred evaluation between a hyper-textbook and a conventional, printed form of the same textbook. The results indicate that the hyper-textbook, in most cases, improves speed, accuracy, and user satisfaction in comparison to the printed form of the textbook

    Visual7W: Grounded Question Answering in Images

    Full text link
    We have seen great progress in basic perceptual tasks such as object recognition and detection. However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning. Recently the new task of visual question answering (QA) has been proposed to evaluate a model's capacity for deep image understanding. Previous works have established a loose, global association between QA sentences and images. However, many questions and answers, in practice, relate to local regions in the images. We establish a semantic link between textual descriptions and image regions by object-level grounding. It enables a new type of QA with visual answers, in addition to textual answers used in previous work. We study the visual QA tasks in a grounded setting with a large collection of 7W multiple-choice QA pairs. Furthermore, we evaluate human performance and several baseline models on the QA tasks. Finally, we propose a novel LSTM model with spatial attention to tackle the 7W QA tasks.Comment: CVPR 201
    corecore