386 research outputs found
To Relive the Web: A Framework for the Transformation and Archival Replay of Web Pages
When replaying an archived web page (known as a memento), the fundamental expectation is that the page should be viewable and function exactly as it did at archival time. However, this expectation requires web archives to modify the page and its embedded resources, so that they no longer reference (link to) the original server(s) they were archived from but back to the archive. Although these modifications necessarily change the state of the representation, it is understood that without them the replay of mementos from the archive would not be possible. Unfortunately, because the replay of mementos and the modifications made to them by web archives in order to facilitate replay varies between archives, the terminology for describing replay and the modification made to mementos for facilitating replay does not exist. In this thesis, we propose terminology for describing the existing styles of replay and the modifications made on the part of web archives to mementos in order to facilitate replay. This thesis also, in the process of defining terminology for the modifications made by client-side rewriting libraries to the JavaScript execution environment of the browser during replay, proposes a general framework for the auto-generation of client-side rewriting libraries. Finally, we evaluate the effectiveness of using a generated client-side rewriting library to augment the existing replay systems of web archives by crawling mementos replayed from the Internet Archive’s Wayback Machine with and without the generated client-side rewriter. By using the generated client-side rewriter we were able to decrease the cumulative number of requests blocked by the content security policy of the Wayback Machine for 577 mementos by 87.5% and increased the cumulative number of requests made by 32.8%. Also by using the generated client-side rewriter, we were able to replay mementos that were previously not replayable from the Internet Archive
JISC Preservation of Web Resources (PoWR) Handbook
Handbook of Web Preservation produced by the JISC-PoWR project which ran from April to November 2008.
The handbook specifically addresses digital preservation issues that are relevant to the UK HE/FE web management community”.
The project was undertaken jointly by UKOLN at the University of Bath and ULCC Digital Archives department
An Updated Portrait of the Portuguese Web
This study presents an updated characterization of the Portuguese
Web derived from a crawl of 48 million contents belonging to
all media types (2.5 TB of data), performed in March, 2008. The resulting
data was analyzed to characterize contents, sites and domains. This
study was performed within the scope of the Portuguese Web Archive.POSC/EU, UMI
Web Archiving in the UK: Current Developments and Reflections for the Future
This work presents a brief overview on the history of Web archiving projects in some English speaking countries, paying particular attention to the development and main problems faced by the UK Web Archive Consortium (UKWAC) and UK Web Archive partnership in Britain. It highlights, particularly, the changeable nature of Web pages through constant content removal and/or alteration and the evolving technological innovations brought recently by Web 2.0 applications, discussing how these factors have an impact on Web archiving projects. It also examines different collecting approaches, harvesting software limitations and how the current copyright and deposit regulations in the UK covering digital contents are failing to support Web archive projects in the country. From the perspective of users’ access, this dissertation offers an analysis of UK Web archive interfaces identifying their main drawbacks and suggesting how these could be further improved in order to better respond to users’ information needs and access to archived Web content
Digital archives : comparative study and interoperability framework
Estágio realizado na ParadigmaXis e orientado pelo Eng.º Filipe CorreiaTese de mestrado integrado. Engenharia Informátca e Computação. Faculdade de Engenharia. Universidade do Porto. 200
Building a New Infrastructure for Digital Media: Northwestern University Library
The Northwestern University Library has been a pioneer in text and media digitization. From early efforts primarily focused on enhancing access to reserve material to current projects involving vast quantities of streaming media, in great part these projects have been the result of close collaboration between the library and other units on campus, particularly Academic Technologies. As the depth and breadth of digitization efforts have increased, so have the technological and organizational issues. This article examines the history of digitization efforts at Northwestern University as a context for exploring the emerging issues most libraries face as digitization enters a new era
The Feminist Library: “History is Herstory, Too”
The Feminist Library is not a typical public library; it is an organization with roots in the historical revolution. Its history, services, and classification system are unique; its collection is irreplaceable. The purpose of this study is to document the history, resources, and organization of the Feminist Library in London, England
The development of a set of principles for the through-life management of engineering information
Belgium Herbarium image of Meise Botanic Garden
- …