Search CORE

59,172 research outputs found

Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives

Author: Milligan Ian
Publication venue: 'Edinburgh University Press'
Publication date: 14/03/2016
Field of study

The version of record can be found at http://www.euppublishing.com/doi/10.3366/ijhac.2016.0161.Contemporary and future historians need to grapple with and confront the challenges posed by web archives. These large collections of material, accessed either through the Internet Archive's Wayback Machine or through other computational methods, represent both a challenge and an opportunity to historians. Through these collections, we have the potential to access the voices of millions of non-elite individuals (recognizing of course the cleavages in both Web access as well as method of access). To put this in perspective, the Old Bailey Online currently describes its monumental holdings of 197,745 trials between 1674 and 1913 as the "largest body of texts detailing the lives of non-elite people ever published." GeoCities.com, a platform for everyday web publishing in the mid-to-late 1990s and early 2000s, amounted to over thirty-eight million individual webpages. Historians will have access, in some form, to millions of pages: written by everyday people of various classes, genders, ethnicities, and ages. While the Web was not a perfect democracy by any means – it was and is unevenly accessed across each of those categories – this still represents a massive collection of non-elite speech. Yet a figure like thirty-eight million webpages is both a blessing and a curse. We cannot read every website, and must instead rely upon discovery tools to find the information that we need. Yet these tools largely do not exist for web archives, or are in a very early state of development: what will they look like? What information do historians want to access? We cannot simply map over web tools optimized for discovering current information through online searches or metadata analysis. We need to find information that mattered at the time, to diverse and very large communities. Furthermore, web pages cannot be viewed in isolation, outside of the networks that they inhabited. In theory, amongst corpuses of millions of pages, researchers can find whatever they want to confirm. The trick is situating it into a larger social and cultural context: is it representative? Unique? In this paper, "Lost in the Infinite Archive," I explore what the future of digital methods for historians will be when they need to explore web archives. Historical research of periods beginning in the mid-1990s will need to use web archives, and right now we are not ready. This article draws on first-hand research with the Internet Archive and Archive-It web archiving teams. It draws upon three exhaustive datasets: the large Web ARChive (WARC) files that make up Wide Web Scrapes of the Web; the metadata-intensive WAT files that provide networked contextual information; and the lifted-straight-from-the-web guerilla archives generated by groups like Archive Team. Through these case studies, we can see – hands-on – what richness and potentials lie in these new cultural records, and what approaches we may need to adopt. It helps underscore the need to have humanists involved at this early, crucial stage.Social Sciences and Humanities Research Council || 430-2013-0616 Ontario Early Researcher Awar

University of Waterloo's Institutional Repository

We Could, but Should We? Ethical Considerations for Providing Access to GeoCities and Other Historical Digital Collections

Author: Bastian M.
Cohen D.
Dent P.
Frankel C.
Hallinan B.
Jenkins H.
Klein E.
Lin J.
McTavish S.
Milligan I.
Milligan I.
Smith M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2020
Field of study

We live in an era in which the ways that we can make sense of our past are evolving as more artifacts from that past become digital. At the same time, the responsibilities of traditional gatekeepers who have negotiated the ethics of historical data collection and use, such as librarians and archivists, are increasingly being sidelined by the system builders who decide whether and how to provide access to historical digital collections, often without sufficient reflection on the ethical issues at hand. It is our aim to better prepare system builders to grapple with these issues. This paper focuses discussions around one such digital collection from the dawn of the web, asking what sorts of analyses can and should be conducted on archival copies of the GeoCities web hosting platform that dates to 1994.This research was supported by the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, the US National Science Foundation (grants 1618695 and 1704369), the Andrew W. Mellon Foundation, Start Smart Labs, and Compute Canada

Crossref

YorkSpace

Metadata enrichment for digital heritage: users as co-creators

Author: Alemu Getaneh
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

This paper espouses the concept of metadata enrichment through an expert and user-focused approach to metadata creation and management. To this end, it is argued the Web 2.0 paradigm enables users to be proactive metadata creators. As Shirky (2008, p.47) argues Web 2.0’s social tools enable “action by loosely structured groups, operating without managerial direction and outside the profit motive”. Lagoze (2010, p. 37) advises, “the participatory nature of Web 2.0 should not be dismissed as just a popular phenomenon [or fad]”. Carletti (2016) proposes a participatory digital cultural heritage approach where Web 2.0 approaches such as crowdsourcing can be sued to enrich digital cultural objects. It is argued that “heritage crowdsourcing, community-centred projects or other forms of public participation”. On the other hand, the new collaborative approaches of Web 2.0 neither negate nor replace contemporary standards-based metadata approaches. Hence, this paper proposes a mixed metadata approach where user created metadata augments expert-created metadata and vice versa. The metadata creation process no longer remains to be the sole prerogative of the metadata expert. The Web 2.0 collaborative environment would now allow users to participate in both adding and re-using metadata. The case of expert-created (standards-based, top-down) and user-generated metadata (socially-constructed, bottom-up) approach to metadata are complementary rather than mutually-exclusive. The two approaches are often mistakenly considered as dichotomies, albeit incorrectly (Gruber, 2007; Wright, 2007) . This paper espouses the importance of enriching digital information objects with descriptions pertaining the about-ness of information objects. Such richness and diversity of description, it is argued, could chiefly be achieved by involving users in the metadata creation process. This paper presents the importance of the paradigm of metadata enriching and metadata filtering for the cultural heritage domain. Metadata enriching states that a priori metadata that is instantiated and granularly structured by metadata experts is continually enriched through socially-constructed (post-hoc) metadata, whereby users are pro-actively engaged in co-creating metadata. The principle also states that metadata that is enriched is also contextually and semantically linked and openly accessible. In addition, metadata filtering states that metadata resulting from implementing the principle of enriching should be displayed for users in line with their needs and convenience. In both enriching and filtering, users should be considered as prosumers, resulting in what is called collective metadata intelligence

Web archives: the future

Author: Arthur Thomas
Eric T. Meyer
Ralph Schroeder
Publication venue: International Internet Preservation Consortium (IIPC)
Publication date
Field of study

T his report is structured first, to engage in some speculative thought about the possible futures of the web as an exercise in prom pting us to think about what we need to do now in order to make sure that we can reliably and fruitfully use archives of the w eb in the future. Next, we turn to considering the methods and tools being used to research the live web, as a pointer to the types of things that can be developed to help unde rstand the archived web. Then , we turn to a series of topics and questions that researchers want or may want to address using the archived web. In this final section, we i dentify some of the challenges individuals, organizations, and international bodies can target to increase our ability to explore these topi cs and answer these quest ions. We end the report with some conclusions based on what we have learned from this exercise

Analysis and Policy Observatory (APO)

Jazz History Database Global Contributor Project

Author: Matticoli Mikel
Varella Lucas
Publication venue: Digital WPI
Publication date: 09/06/2020
Field of study

The JazzHistoryDatabase is a non-profit organization at WPI that archives recordings, photographs, and other jazz artifacts from around the world that might otherwise deteriorate. This archive, accessible at jazzhistorydatabase.com, was previously maintained by students and faculty who built the website by hand. Our goal was to design, build and document a web-based tool to allow volunteers and international correspondents to upload digitized jazz artifacts into a simple web form that would subsequently output and publish template-based web pages

DigitalCommons@WPI

Seven Dimensions of Portability for Language Documentation and Description

Author: Bird Steven
Simons Gary
Publication venue
Publication date: 01/01/2002
Field of study

The process of documenting and describing the world's languages is undergoing radical transformation with the rapid uptake of new digital technologies for capture, storage, annotation and dissemination. However, uncritical adoption of new tools and technologies is leading to resources that are difficult to reuse and which are less portable than the conventional printed resources they replace. We begin by reviewing current uses of software tools and digital technologies for language documentation and description. This sheds light on how digital language documentation and description are created and managed, leading to an analysis of seven portability problems under the following headings: content, format, discovery, access, citation, preservation and rights. After characterizing each problem we provide a series of value statements, and this provides the framework for a broad range of best practice recommendations.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Melbourne Institutional Repository

A Guide to Distributed Digital Preservation

Author: Schultz Matt
Skinner Katherine
Publication venue: Educopia Institute
Publication date: 01/01/2010
Field of study

This volume is devoted to the broad topic of distributed digital preservation, a still-emerging field of practice for the cultural memory arena. Replication and distribution hold out the promise of indefinite preservation of materials without degradation, but establishing effective organizational and technical processes to enable this form of digital preservation is daunting. Institutions need practical examples of how this task can be accomplished in manageable, low-cost ways."--P. [4] of cove

Boston University Institutional Repository (OpenBU)

Finding the way: improving access to the collections of the Royal Scottish Geographical Society

Author: Fenton C.
Publication venue: 'Emerald'
Publication date: 01/10/2007
Field of study

This case study describes and discusses the ‘Images for All’ project at the Royal Scottish Geographical Society and lessons learned from it. The background to the project and collections held is described. The case study focuses on the development of the project website, the digitisation of 100 images from the collection and the nature of project management in a small scale project. The paper finds that there are many potential challenges faced by project managers working in small voluntary organisations, but these can be overcome

Crossref

Enlighten

Off the Beaten tracks: Exploring Three Aspects of Web Navigation

Author: Herder E.
Mayer M.
Obendorf H.
Weinreich H.
Publication venue: ACM Press
Publication date: 01/01/2006
Field of study

This paper presents results of a long-term client-side Web usage study, updating previous studies that range in age from five to ten years. We focus on three aspects of Web navigation: changes in the distribution of navigation actions, speed of navigation and within-page navigation. “Navigation actions” corresponding to users’ individual page requests are discussed by type. We reconfirm links to be the most important navigation element, while backtracking has lost more than half of its previously reported share and form submission has become far more common. Changes of the Web and the browser interfaces are candidates for causing these changes. Analyzing the time users stayed on pages, we confirm Web navigation to be a rapidly interactive activity. A breakdown of page characteristics shows that users often do not take the time to read the available text or consider all links. The performance of the Web is analyzed and reassessed against the resulting requirements. Finally, habits of within-page navigation are presented. Although most selected hyperlinks are located in the top left corner of the screen, in nearly a quarter of all cases people choose links that require scrolling. We analyzed the available browser real estate to gain insights for the design of non-scrolling Web pages

CiteSeerX

Crossref

University of Twente Research Information