90,140 research outputs found

    Automatic Generation of Story Highlights

    Get PDF
    In this paper we present a joint content selection and compression model for single-document summarization. The model operates over a phrase-based representation of the source document which we obtain by merging information from PCFG parse trees and dependency graphs. Using an integer linear programming formulation, the model learns to select and combine phrases subject to length, coverage and grammar constraints. We evaluate the approach on the task of generating “story highlights”—a small number of brief, self-contained sentences that allow readers to quickly gather information on news stories. Experimental results show that the model’s output is comparable to human-written highlights in terms of both grammaticality and content.

    Variational recurrent sequence-to-sequence retrieval for stepwise illustration

    Get PDF
    We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a sequence of text passages as query, the goal is to retrieve a sequence of images that best describes and aligns with the query. This new task extends the traditional cross-modal retrieval, where each image-text pair is treated independently ignoring broader context. We propose a novel variational recurrent seq2seq (VRSS) retrieval model for this seq2seq task. Unlike most cross-modal methods, we generate an image vector corresponding to the latent topic obtained from combining the text semantics and context. This synthetic image embedding point associated with every text embedding point can then be employed for either image generation or image retrieval as desired. We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. To this end, we build and release a new Stepwise Recipe dataset for research purposes, containing 10K recipes (sequences of image-text pairs) having a total of 67K image-text pairs. To our knowledge, it is the first publicly available dataset to offer rich semantic descriptions in a focused category such as food or recipes. Our model is shown to outperform several competitive and relevant baselines in the experiments. We also provide qualitative analysis of how semantically meaningful the results produced by our model are through human evaluation and comparison with relevant existing methods

    Key Phrase Extraction of Lightly Filtered Broadcast News

    Get PDF
    This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis. Elimination of as little as 10% of the document sentences lead to a 2% improvement in AKE precision and recall. AKE is built over MAUI toolkit that follows a supervised learning approach. We trained and tested our AKE method on a gold standard made of 8 BN programs containing 110 manually annotated news stories. The experiments were conducted within a Multimedia Monitoring Solution (MMS) system for TV and radio news/programs, running daily, and monitoring 12 TV and 4 radio channels.Comment: In 15th International Conference on Text, Speech and Dialogue (TSD 2012

    Hierarchically-Attentive RNN for Album Summarization and Storytelling

    Full text link
    We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.Comment: To appear at EMNLP-2017 (7 pages

    Creating stories for reflection from multimodal lifelog content: An initial investigation

    Get PDF
    Using lifelogging tools, digital artifacts can be collected continuously and passively throughout our day. These may include a stream of images recorded passively using tools such as the Microsoft SenseCam; documents, emails and webpages accessed; texts messages and mobile activity; and context sensing to uncover the current location and proximal individuals. The wealth of information such an archive contains on our personal life history provides us with the opportunity to review, reflect and reminisce upon our past experience. However, the complexity, volume and multimodal nature of such collections creates a barrier to such activities. We are currently exploring the potential of digital narratives formed from these collections as a means to overcome these challenges. By successfully reducing the content to that most appropriate to the story, and by then presenting it in a coherent and usable manner, we can hope to better enable reflection. The means by which content reduction and presentation should occur is investigated through card sorting activities and probe sessions which nine participants engaged in. The initial results are discussed, as well as the opportunity, as seen in these sessions, for lifelog-based stories to provide utility in personal reflection and reminiscence

    InfoLink: analysis of Dutch broadcast news and cross-media browsing

    Get PDF
    In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-data are used to index and link items. The concept is visualised using SMIL-scripts for presenting the streaming broadcast news video and the information links

    Extraction and Classification of Self-consumable Sport Video Highlights

    Get PDF
    This paper aims to automatically extract and classify self-consumable sport video highlights. For this purpose, we will emphasize the benefits of using play-break sequences as the effective inputs for HMM-based classifier. HMM is used to model the stochastic pattern of high-level states during specific sport highlights which correspond to the sequence of generic audio-visual measurements extracted from raw video data. This paper uses soccer as the domain study, focusing on the extraction and classification of goal, shot and foul highlights. The experiment work which uses183 play-break sequences from 6 soccer matches will be presented to demonstrate the performance of our proposed scheme
    • …
    corecore