3 research outputs found

    SHARI- An Integration of Tools to Visualize the Story of the Day

    Get PDF
    Tools such as google news and flipboard exist to convey daily news, but what about the news of the past? In this paper, we describe how to combine several existing tools and web archive holdings to convey the “biggest story” for a given date in the past. StoryGraph clusters news articles together to identify a common news story. Hypercane leverages ArchiveNow to store URLs produced by Story-Graph in web archives. Hypercane analyzes these URLs to identify the most common terms, entities, and highest quality images for social media storytelling. Raintale then takes the output of these tools to produce a visualization of the news story for a given day. We name this process SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration). With SHARI, a user can visualize the articles belonging to a past date’s news story

    SHARI -- An Integration of Tools to Visualize the Story of the Day

    Get PDF
    Tools such as Google News and Flipboard exist to convey daily news, but what about the past? In this paper, we describe how to combine several existing tools with web archive holdings to perform news analysis and visualization of the "biggest story" for a given date. StoryGraph clusters news articles together to identify a common news story. Hypercane leverages ArchiveNow to store URLs produced by StoryGraph in web archives. Hypercane analyzes these URLs to identify the most common terms, entities, and highest quality images for social media storytelling. Raintale then uses the output of these tools to produce a visualization of the news story for a given day. We name this process SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration).Comment: 19 pages, 16 figures, 1 Tabl

    Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives

    Get PDF
    Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding an expensive proposition. This dissertation establishes a five-process model to assist with web archive collection understanding. This model aims to produce a social media story – a visualization with which most web users are familiar. Each social media story contains surrogates which are summaries of individual documents. These surrogates, when presented together, summarize the topic of the story. After applying our storytelling model, they summarize the topic of a web archive collection. We develop and test a framework to select the best exemplars that represent a collection. We establish that algorithms produced from these primitives select exemplars that are otherwise undiscoverable using conventional search engine methods. We generate story metadata to improve the information scent of a story so users can understand it better. After an analysis showing that existing platforms perform poorly for web archives and a user study establishing the best surrogate type, we generate document metadata for the exemplars with machine learning. We then visualize the story and document metadata together and distribute it to satisfy the information needs of multiple personas who benefit from our model. Our tools serve as a reference implementation of our Dark and Stormy Archives storytelling model. Hypercane selects exemplars and generates story metadata. MementoEmbed generates document metadata. Raintale visualizes and distributes the story based on the story metadata and the document metadata of these exemplars. By providing understanding immediately, our stories save users the time and effort of reading thousands of documents and, most importantly, help them understand web archive collections
    corecore