108 research outputs found

    Generating Natural Questions About an Image

    Full text link
    There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.Comment: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistic

    A Survey of Current Datasets for Vision and Language Research

    Full text link
    Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.Comment: To appear in EMNLP 2015, short proceedings. Dataset analysis and discussion expanded, including an initial examination into reporting bias for one of them. F.F. and N.M. contributed equally to this wor

    Beyond SumBasic: Task-Focused Summarization with Sentence Simplification and Lexical Expansion

    Get PDF
    In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems

    Observing and Simulating the Summertime Low-Level Jet in Central Iowa

    Get PDF
    In the U.S. state of Iowa, the increase in wind power production has motivated interest into the impacts of low-level jets on turbine performance. In this study, two commercial lidar systems were used to sample wind profiles in August 2013. Jets were systematically detected and assigned an intensity rating from 0 (weak) to 3 (strong). Many similarities were found between observed jets and the well-studied Great Plains low-level jet in summer, including average jet heights between 300 and 500 m above ground level, a preference for southerly wind directions, and a nighttime bias for stronger jets. Strong vertical wind shear and veer were observed, as well as veering over time associated with the LLJs. Speed, shear, and veer increases extended into the turbine-rotor layer during intense jets. Ramp events, in which winds rapidly increase or decrease in the rotor layer, were also commonly observed during jet formation periods. The lidar data were also used to evaluate various configurations of the Weather Research and Forecasting Model. Jet occurrence exhibited a stronger dependence on the choice of initial and boundary condition data, while reproduction of the strongest jets was influenced more strongly by the choice of planetary boundary layer scheme. A decomposition of mean model winds suggested that the main forcing mechanism for observed jets was the inertial oscillation. These results have implications for wind energy forecasting and site assessment in the Midwest

    CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures

    Get PDF
    Abstract Learning commonsense causal and temporal relation between events is one of the major steps towards deeper language understanding. This is even more crucial for understanding stories and script learning. A prerequisite for learning scripts is a semantic framework which enables capturing rich event structures. In this paper we introduce a novel semantic annotation framework, called Causal and Temporal Relation Scheme (CaTeRS), which is unique in simultaneously capturing a comprehensive set of temporal and causal relations between events. By annotating a total of 1,600 sentences in the context of 320 five-sentence short stories sampled from ROCStories corpus, we demonstrate that these stories are indeed full of causal and temporal relations. Furthermore, we show that the CaTeRS annotation scheme enables high inter-annotator agreement for broad-coverage event entity annotation and moderate agreement on semantic link annotation

    Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation

    Get PDF
    Abstract The popularity of image sharing on social media reflects the important role visual context plays in everyday conversation. In this paper, we present a novel task, ImageGrounded Conversations (IGC), in which natural-sounding conversations are generated about shared photographic images. We investigate this task using training data derived from image-grounded conversations on social media and introduce a new dataset of crowd-sourced conversations for benchmarking progress. Experiments using deep neural network models trained on social media data show that the combination of visual and textual context can enhance the quality of generated conversational turns. In human evaluation, a gap between human performance and that of both neural and retrieval architectures suggests that IGC presents an interesting challenge for vision and language research
    corecore