Search CORE

6,389 research outputs found

Bootstrapping Web Archive Collections From Micro-Collections in Social Media

Author: Nwala Alexander C.
Publication venue: ODU Digital Commons
Publication date: 01/08/2020
Field of study

In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but this ability comes at a cost: it is time consuming to collect these seeds. The result of this is a shortage of curators, a lack of Web archive collections for various important news events, and a need for an automatic system for generating seeds. We investigate the problem of generating seed URIs automatically, and explore the state of the art in collection building and seed selection. Attempts toward generating seeds automatically have mostly relied on scraping Web or social media Search Engine Result Pages (SERPs). In this work, we introduce a novel source for generating seeds from URIs in the threaded conversations of social media posts created by single or multiple users. Users on social media sites routinely create and share narratives about news events consisting of hand-selected URIs of news stories, tweets, videos, etc. In this work, we call these posts Micro-collections, whether shared on Reddit or Twitter, and we consider them as an important source for seeds. This is because, the effort taken to create Micro-collections is an indication of editorial activity and a demonstration of domain expertise. Therefore, we propose a model for generating seeds from Micro-collections. We begin by introducing a simple vocabulary, called post class for describing social media posts across different platforms, and extract seeds from the Micro-collections post class. We further propose Quality Proxies for seeds by extending the idea of collection comparison to evaluation, and present our Micro-collection/Quality Proxy (MCQP) framework for bootstrapping Web archive collections from Micro-collections in social media

Old Dominion University

Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives

Author: Jones Shawn M.
Publication venue: ODU Digital Commons
Publication date: 01/07/2021
Field of study

Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding an expensive proposition. This dissertation establishes a five-process model to assist with web archive collection understanding. This model aims to produce a social media story – a visualization with which most web users are familiar. Each social media story contains surrogates which are summaries of individual documents. These surrogates, when presented together, summarize the topic of the story. After applying our storytelling model, they summarize the topic of a web archive collection. We develop and test a framework to select the best exemplars that represent a collection. We establish that algorithms produced from these primitives select exemplars that are otherwise undiscoverable using conventional search engine methods. We generate story metadata to improve the information scent of a story so users can understand it better. After an analysis showing that existing platforms perform poorly for web archives and a user study establishing the best surrogate type, we generate document metadata for the exemplars with machine learning. We then visualize the story and document metadata together and distribute it to satisfy the information needs of multiple personas who benefit from our model. Our tools serve as a reference implementation of our Dark and Stormy Archives storytelling model. Hypercane selects exemplars and generates story metadata. MementoEmbed generates document metadata. Raintale visualizes and distributes the story based on the story metadata and the document metadata of these exemplars. By providing understanding immediately, our stories save users the time and effort of reading thousands of documents and, most importantly, help them understand web archive collections

Old Dominion University

The Archigram Archive

Author: Rattenbury K.
Rattenbury K.
Publication venue: University of Westminster
Publication date: 01/01/2010
Field of study

The Archigram archival project made the works of seminal experimental architectural group Archigram available free online for an academic and general audience. It was a major archival work, and a new kind of digital academic archive, displaying material held in different places around the world and variously owned. It was aimed at a wide online design community, discovering it through Google or social media, as well as a traditional academic audience. It has been widely acclaimed in both fields. The project has three distinct but interlinked aims: firstly to assess, catalogue and present the vast range of Archigram's prolific work, of which only a small portion was previously available; secondly to provide reflective academic material on Archigram and on the wider picture of their work presented; thirdly to develop a new type of non-ownership online archive, suitable for both academic research at the highest level and for casual public browsing. The project hybridised several existing methodologies. It combined practical archival and editorial methods for the recovery, presentation and contextualisation of Archigram's work, with digital web design and with the provision of reflective academic and scholarly material. It was designed by the EXP Research Group in the Department of Architecture in collaboration with Archigram and their heirs and with the Centre for Parallel Computing, School of Electronics and Computer Science, also at the University of Westminster. It was rated 'outstanding' in the AHRC's own final report and was shortlisted for the RIBA research awards in 2010. It received 40,000 users and more than 250,000 page views in its first two weeks live, taking the site into twitter’s Top 1000 sites, and a steady flow of visitors thereafter. Further statistics are included in the accompanying portfolio. This output will also be returned to by Murray Fraser for UCL

WestminsterResearch

Archiving Interactive Narratives at the British Library

Author: Clark Lynda
Rossi Giulia Carla
Wisdom Stella
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

This paper describes the creation of the Interactive Narratives collection in the UK Web Archive, as part of the UK Legal Deposit Libraries Emerging Formats Project. The aim of the project is to identify, collect and preserve complex digital publications that are in scope for collection under UK Non-Print Legal Deposit Regulations. This article traces the process of building the Interactive Narratives collection, analysing the different tools and methods used and placing the collection within the wider context of Emerging Formats work and engagement activities at the British Library

Crossref

British Library (BL) Shared Research Repository

University of Dundee Online Publications

The Unending Lives of Net-Based Artworks: Web Archives, Browser Emulators, and New Conceptual Frameworks

Author: NC DOCKS at The University of North Carolina at Greensboro
Post Colin
Publication venue
Publication date: 01/01/2017
Field of study

Research into net-based artworks is an undertaking divergent from much prior art historical scholarship. While most objects of art history are stable analog works, largely in museum collections, net-based artworks are vital and complex entities, existing on artists’ websites alongside older versions captured in web archives. Scholars can profitably use web archives, browser emulators, and other digital methods to study the history of these works, but these new methods raise critical methodological issues. Art historians must contend with how the artwork changes over time, as well as the ever-evolving environment of the web itself. Probing the piece Homework by Alexei Shulgin as a test case, I investigate the methodological issues that arise when conducting art history research using web archives. In applying these methods, scholars must also attend to the evolving and multiple nature of these artworks. Drawing on the archival theory of Wolfgang Ernst and the records continuum model developed by Frank Upward and Sue McKemmish, I present a framework for conceptualising net-based artworks as plural and heterogeneous archives. This framework is generative of new readings of net-based artworks, accommodates new methods, and can also usefully equip scholars approaching dynamic cultural heritage objects in web archives more broadly

The University of North Carolina at Greensboro

Proceedings of the 12th International Conference on Digital Preservation

Author
Publication venue: School of Information and Library Science, University of North Carolina
Publication date: 01/01/2016
Field of study

The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase

Simple identification tools in FishBase

Author: Atanacio Rachek
Bailly Nicolas
Froese Rainer
Reyes Jr. Rodolfo
Publication venue: EUT - Edizioni Università di Trieste
Publication date: 01/01/2010
Field of study

Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy

OceanRep

OpenstarTs

Proceedings of the 12th International Conference on Digital Preservation

Author
Publication venue: School of Information and Library Science, University of North Carolina
Publication date: 01/01/2016
Field of study

Enlighten