36,275 research outputs found

    Language-Based Image Editing with Recurrent Attentive Models

    Full text link
    We investigate the problem of Language-Based Image Editing (LBIE). Given a source image and a natural language description, we want to generate a target image by editing the source image based on the description. We propose a generic modeling framework for two sub-tasks of LBIE: language-based image segmentation and image colorization. The framework uses recurrent attentive models to fuse image and language features. Instead of using a fixed step size, we introduce for each region of the image a termination gate to dynamically determine after each inference step whether to continue extrapolating additional information from the textual description. The effectiveness of the framework is validated on three datasets. First, we introduce a synthetic dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE system. Second, we show that the framework leads to state-of-the-art performance on image segmentation on the ReferIt dataset. Third, we present the first language-based colorization result on the Oxford-102 Flowers dataset.Comment: Accepted to CVPR 2018 as a Spotligh

    Interpretation of Natural Language Rules in Conversational Machine Reading

    Get PDF
    Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader's background knowledge. One example is the task of interpreting regulations to answer "Can I...?" or "Do I have to...?" questions such as "I am working in Canada. Do I have to carry on paying UK National Insurance?" after reading a UK government website about this topic. This task requires both the interpretation of rules and the application of background knowledge. It is further complicated due to the fact that, in practice, most questions are underspecified, and a human assistant will regularly have to ask clarification questions such as "How long have you been working abroad?" when the answer cannot be directly derived from the question and text. In this paper, we formalise this task and develop a crowd-sourcing strategy to collect 32k task instances based on real-world rules and crowd-generated questions and scenarios. We analyse the challenges of this task and assess its difficulty by evaluating the performance of rule-based and machine-learning baselines. We observe promising results when no background knowledge is necessary, and substantial room for improvement whenever background knowledge is needed.Comment: EMNLP 201

    Teaching Machines to Read and Comprehend

    Full text link
    Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.Comment: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 14 pages, 13 figure

    Pathologies of Neural Models Make Interpretations Difficult

    Full text link
    One way to interpret neural model predictions is to highlight the most important input features---for example, a heatmap visualization over the words in an input sentence. In existing interpretation methods for NLP, a word's importance is determined by either input perturbation---measuring the decrease in model confidence when that word is removed---or by the gradient with respect to that word. To understand the limitations of these methods, we use input reduction, which iteratively removes the least important word from the input. This exposes pathological behaviors of neural models: the remaining words appear nonsensical to humans and are not the ones determined as important by interpretation methods. As we confirm with human experiments, the reduced examples lack information to support the prediction of any label, but models still make the same predictions with high confidence. To explain these counterintuitive results, we draw connections to adversarial examples and confidence calibration: pathological behaviors reveal difficulties in interpreting neural models trained with maximum likelihood. To mitigate their deficiencies, we fine-tune the models by encouraging high entropy outputs on reduced examples. Fine-tuned models become more interpretable under input reduction without accuracy loss on regular examples.Comment: EMNLP 2018 camera read

    The Limited Effect of Graphic Elements in Video and Augmented Reality on Children’s Listening Comprehension

    Get PDF
    There is currently significant interest in the use of instructional strategies in learning environments thanks to the emergence of new multimedia systems that combine text, audio, graphics and video, such as augmented reality (AR). In this light, this study compares the effectiveness of AR and video for listening comprehension tasks. The sample consisted of thirty-two elementary school students with different reading comprehension. Firstly, the experience, instructions and objectives were introduced to all the students. Next, they were divided into two groups to perform activities—one group performed an activity involving watching an Educational Video Story of the Laika dog and her Space Journey available by mobile devices app Blue Planet Tales, while the other performed an activity involving the use of AR, whose contents of the same history were visualized by means of the app Augment Sales. Once the activities were completed participants answered a comprehension test. Results (p = 0.180) indicate there are no meaningful differences between the lesson format and test performance. But there are differences between the participants of the AR group according to their reading comprehension level. With respect to the time taken to perform the comprehension test, there is no significant difference between the two groups but there is a difference between participants with a high and low level of comprehension. To conclude SUS (System Usability Scale) questionnaire was used to establish the measure usability for the AR app on a smartphone. An average score of 77.5 out of 100 was obtained in this questionnaire, which indicates that the app has fairly good user-centered design
    • …
    corecore