6,389 research outputs found

    Bootstrapping Web Archive Collections From Micro-Collections in Social Media

    Get PDF
    In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but this ability comes at a cost: it is time consuming to collect these seeds. The result of this is a shortage of curators, a lack of Web archive collections for various important news events, and a need for an automatic system for generating seeds. We investigate the problem of generating seed URIs automatically, and explore the state of the art in collection building and seed selection. Attempts toward generating seeds automatically have mostly relied on scraping Web or social media Search Engine Result Pages (SERPs). In this work, we introduce a novel source for generating seeds from URIs in the threaded conversations of social media posts created by single or multiple users. Users on social media sites routinely create and share narratives about news events consisting of hand-selected URIs of news stories, tweets, videos, etc. In this work, we call these posts Micro-collections, whether shared on Reddit or Twitter, and we consider them as an important source for seeds. This is because, the effort taken to create Micro-collections is an indication of editorial activity and a demonstration of domain expertise. Therefore, we propose a model for generating seeds from Micro-collections. We begin by introducing a simple vocabulary, called post class for describing social media posts across different platforms, and extract seeds from the Micro-collections post class. We further propose Quality Proxies for seeds by extending the idea of collection comparison to evaluation, and present our Micro-collection/Quality Proxy (MCQP) framework for bootstrapping Web archive collections from Micro-collections in social media

    Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives

    Get PDF
    Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding an expensive proposition. This dissertation establishes a five-process model to assist with web archive collection understanding. This model aims to produce a social media story – a visualization with which most web users are familiar. Each social media story contains surrogates which are summaries of individual documents. These surrogates, when presented together, summarize the topic of the story. After applying our storytelling model, they summarize the topic of a web archive collection. We develop and test a framework to select the best exemplars that represent a collection. We establish that algorithms produced from these primitives select exemplars that are otherwise undiscoverable using conventional search engine methods. We generate story metadata to improve the information scent of a story so users can understand it better. After an analysis showing that existing platforms perform poorly for web archives and a user study establishing the best surrogate type, we generate document metadata for the exemplars with machine learning. We then visualize the story and document metadata together and distribute it to satisfy the information needs of multiple personas who benefit from our model. Our tools serve as a reference implementation of our Dark and Stormy Archives storytelling model. Hypercane selects exemplars and generates story metadata. MementoEmbed generates document metadata. Raintale visualizes and distributes the story based on the story metadata and the document metadata of these exemplars. By providing understanding immediately, our stories save users the time and effort of reading thousands of documents and, most importantly, help them understand web archive collections

    The Archigram Archive

    Get PDF
    The Archigram archival project made the works of seminal experimental architectural group Archigram available free online for an academic and general audience. It was a major archival work, and a new kind of digital academic archive, displaying material held in different places around the world and variously owned. It was aimed at a wide online design community, discovering it through Google or social media, as well as a traditional academic audience. It has been widely acclaimed in both fields. The project has three distinct but interlinked aims: firstly to assess, catalogue and present the vast range of Archigram's prolific work, of which only a small portion was previously available; secondly to provide reflective academic material on Archigram and on the wider picture of their work presented; thirdly to develop a new type of non-ownership online archive, suitable for both academic research at the highest level and for casual public browsing. The project hybridised several existing methodologies. It combined practical archival and editorial methods for the recovery, presentation and contextualisation of Archigram's work, with digital web design and with the provision of reflective academic and scholarly material. It was designed by the EXP Research Group in the Department of Architecture in collaboration with Archigram and their heirs and with the Centre for Parallel Computing, School of Electronics and Computer Science, also at the University of Westminster. It was rated 'outstanding' in the AHRC's own final report and was shortlisted for the RIBA research awards in 2010. It received 40,000 users and more than 250,000 page views in its first two weeks live, taking the site into twitter’s Top 1000 sites, and a steady flow of visitors thereafter. Further statistics are included in the accompanying portfolio. This output will also be returned to by Murray Fraser for UCL

    Archiving Interactive Narratives at the British Library

    Get PDF
    This paper describes the creation of the Interactive Narratives collection in the UK Web Archive, as part of the UK Legal Deposit Libraries Emerging Formats Project. The aim of the project is to identify, collect and preserve complex digital publications that are in scope for collection under UK Non-Print Legal Deposit Regulations. This article traces the process of building the Interactive Narratives collection, analysing the different tools and methods used and placing the collection within the wider context of Emerging Formats work and engagement activities at the British Library

    The Unending Lives of Net-Based Artworks: Web Archives, Browser Emulators, and New Conceptual Frameworks

    Get PDF
    Research into net-based artworks is an undertaking divergent from much prior art historical scholarship. While most objects of art history are stable analog works, largely in museum collections, net-based artworks are vital and complex entities, existing on artists’ websites alongside older versions captured in web archives. Scholars can profitably use web archives, browser emulators, and other digital methods to study the history of these works, but these new methods raise critical methodological issues. Art historians must contend with how the artwork changes over time, as well as the ever-evolving environment of the web itself. Probing the piece Homework by Alexei Shulgin as a test case, I investigate the methodological issues that arise when conducting art history research using web archives. In applying these methods, scholars must also attend to the evolving and multiple nature of these artworks. Drawing on the archival theory of Wolfgang Ernst and the records continuum model developed by Frank Upward and Sue McKemmish, I present a framework for conceptualising net-based artworks as plural and heterogeneous archives. This framework is generative of new readings of net-based artworks, accommodates new methods, and can also usefully equip scholars approaching dynamic cultural heritage objects in web archives more broadly

    Proceedings of the 12th International Conference on Digital Preservation

    Get PDF
    The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase

    Simple identification tools in FishBase

    Get PDF
    Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy

    Proceedings of the 12th International Conference on Digital Preservation

    Get PDF
    The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase
    • …
    corecore