21 research outputs found

    Web Archiving in the UK: Current Developments and Reflections for the Future

    Get PDF
    This work presents a brief overview on the history of Web archiving projects in some English speaking countries, paying particular attention to the development and main problems faced by the UK Web Archive Consortium (UKWAC) and UK Web Archive partnership in Britain. It highlights, particularly, the changeable nature of Web pages through constant content removal and/or alteration and the evolving technological innovations brought recently by Web 2.0 applications, discussing how these factors have an impact on Web archiving projects. It also examines different collecting approaches, harvesting software limitations and how the current copyright and deposit regulations in the UK covering digital contents are failing to support Web archive projects in the country. From the perspective of users’ access, this dissertation offers an analysis of UK Web archive interfaces identifying their main drawbacks and suggesting how these could be further improved in order to better respond to users’ information needs and access to archived Web content

    Digital contemporary history: sources, tools, methods, issues

    Get PDF
    This essay suggests that there has been a relative lack of digitally enabled historical research on the recent past, when compared to earlier periods of history. It explores why this might be the case, focussing in particular on both the obstacles and some missing drivers to mass digitisation of primary sources for the 20th century. It suggests that the situation is likely to change, and relatively soon, as a result of the increasing availability of sources that were born digital, and of Web archives in particular. The article ends with some reflections on several shifts in method and approach, which that changed situation is likely to entail

    The Pandemic at Home: Learning from Community-engaged Covid-19 Documentation Efforts in the Southeastern US

    Get PDF
    Cultural heritage institutions of all kinds around the world responded to the Covid-19 pandemic by launching community-engaged collecting efforts that solicited the submission of documents capturing the daily experience of an historically significant phenomenon. While the pandemic is global in scale, these collecting efforts document the impact of Covid-19 at local or regional levels. This article reports on research to better understand how cultural heritage institutions in the Southeastern United States have developed community-engaged collecting projects. Analyzing data collected from the public websites of 30 institutions, as well as semi-structured interviews with 10 cultural heritage professionals active in the Covid-19 documentation projects at these institutions, this research broadly characterizes the nature of these collecting efforts and surfaces key issues and challenges that have impacted the launch, development, and ongoing management of these collections. These collecting efforts have required the adaptation of existing workflows along with the acquisition of new skills and archival practices, particularly in the area of digital curation. As part of planning and managing these projects, practitioners have grappled with complex ethical questions about how to responsibly and equitably engage communities in the midst of a traumatic event. In both acquiring new skills and reframing an ethics of collecting, practitioners have turned to many sources for learning and growth; notably, communities of fellow practitioners involved in Covid-19 documentation projects have proven instrumental in sharing resources and discussing emergent issues and challenges

    Users, technologies, organisations: Towards a cultural history of world web archiving

    Get PDF
    If 2015 marked the elapse of 25 years since the birth of the web, 2016 marked the 20th anniversary of web archiving: of systematic attempts to preserve web content and make it accessible to scholars and the public. As such, the time is ripe to make an initial assessment of the history of the movement, and the patterns into which it has already fallen. Although there have been short sketches of this history, this chapter represents the first attempt to document the subject at length. It concentrates on what might be termed the cultural history of the movement, addressing not the question of how web archiving has been carried out, but of why, by whom, and on whose behalf. If this chapter serves to orient users as to some of the questions they should be asking of their sources, and of the institutions that provide them, it will have achieved its aim

    Arquivamento e preservação da Web : procedimentos de coleta e armazenamento de sites institucionais dos arquivos públicos estaduais brasileiros

    Get PDF
    Devido à velocidade crescente com que perdemos as informações produzidas e disponibilizadas na web, este estudo propõe uma análise do processo de arquivamento da web a partir dos procedimentos de coleta e preservação dos sites institucionais dos arquivos públicos estaduais brasileiros. Caracterizado como pesquisa de natureza exploratória e descritiva, em conjunto com as abordagens quantitativas e qualitativas, buscou descrever o histórico do arquivamento da web e as principais políticas e tecnologias envolvidas na preservação do conteúdo digital. Após o levantamento dos arquivos públicos estaduais que possuem site institucional, arquivou todas as URLs encontradas no recurso de captura de páginas “Save Page Now”, da plataforma Internet Archive. Comparou quantas instituições possuem site institucional próprio ou vinculado, bem como quais arquivos públicos mais publicam na internet. Analisou a quantidade de informações e conteúdos publicados por cada arquivo em seu website, bem como a funcionalidade das páginas web capturadas na plataforma Internet Arquive, por meio da utilização da ferramenta de pesquisa Wayback Machine Analisou os sites institucionais como memória digital das instituições memorialísticas. Concluiu que a quantidade de URLs referentes aos arquivos públicos com site institucional próprio é evidentemente maior do que a quantidade de URLs sobre arquivos que possuem página da web vinculada ao site de uma Secretaria de Estado. Propôs a criação de uma plataforma nacional de preservação da web e o arquivamento contínuo dos sites institucionais dos arquivos públicos estaduais brasileiros, com o objetivo de preservar o conteúdo digital para gerações futuras.Due to the increasing speed with which we lose the information that we produce and provide in the web, this study proposes an analysis the web archiving process from the collection and preservation procedures of the institutional sites of the Brazilian state public archives. Characterized as a research of an exploratory and descriptive nature, as well as a quantitative and qualitative approaches, sought to describe the history of web archiving and the main policies and technologies involved in the preservation of digital content. After the survey of the state public archives that have institutional sites, all the URLs that were found were filed in the page capture feature “Save Page Now”, of the platform Internet Archive. It compared how many institutions has their own institutional site or has a linked site, as well as witch public archives most publish on the internet. It analyzed the amount of the information and content published by each archive in their website, as well as the functionality of the web pages captured in the platform Internet Archive, through the use of the search tool Wayback Machine. It analyzed the institutional sites as digital memories of the memorial institutions. It concluded that the amount of URLs referring to public archives with their own institutional sites is evidently greater than the amount of URLs about archives that have web pages linked to the site of a State Secretary. It proposed the creation of a national platform of web preservation and the continuous archiving of the institutional sites of the Brazilian state public archives, with the objective of preserve the digital content to future generations

    Identifying the Bounds of an Internet Resource

    Get PDF
    Systems for retrieving or archiving Internet resources often assume a URL acts as a delimiter for the resource. But there are many situations where Internet resources do not have a one-to-one mapping with URLs. For URLs that point to the first page of a document that has been broken up over multiple pages, users are likely to consider the whole article as the resource, even though it is spread across multiple URLs. Comments, tags, ratings, and advertising might or might not be perceived as part of the resource whether they are retrieved as part of the primary URL or accessed via a link. Understanding what people perceive as part of a resource is necessary prior to developing algorithms to detect and make use of resource boundaries. A pilot study examined how content similarity, URL similarity, and the combination of the two matched human expectations. This pilot study showed that more nuanced techniques were needed that took into account the particular content and context of the resource and related content. Based on the lessons from the pilot study, a study was performed focused on two research questions: (1) how particular relationships between the content of pages effect expectations and (2) how encountered implementations of saving and perceptions of content value relate to the notion of internet resource bounds. Results showed that human expectations are affected by expected relationships, such as two web pages showing parts of the same news article. They are also affected when two content elements are part of the same set of content, as is the case when two photos are presented as members of the same collection or presentation. Expectations were also affected by the role of the content – advertisements presented alongside articles or photos were less likely to be considered as part of a resource. The exploration of web resource boundaries found that people’s assessments of resource bounds rely on understanding relationships between content fragments on the same web page and between content fragments on different web pages. These results were in the context of personal archiving scenarios. Would institutional archives have different expectations? A follow-on study gathered perceptions in the context of institutional archiving questions to explore whether such perceptions change based on whether the archive is for personal use or is institutional in nature. Results show that there are similar expectations for preserving continuations of the main content in personal and institutional archiving scenarios. Institutional archives are more likely to be expected to preserve the context of the main content, such as additional linked content, advertisements, and author information. This implies alternative resource bounds based on the type of content, relationships between content elements, and the type of archive in consideration. Based on the predictive features that gathered, an automatic classification for determining if two pieces of content should be considered as part of the same resource was designed. This classifier is an example of taking into account the features identified as important in the studies of human perceptions when developing techniques that bound materials captured during the archiving of online resources
    corecore