    Tools for Managing the Past Web

    PDF of a powerpoint presentation from an Old Dominion University - ECE Department Seminar, February 20, 2015. Also available on Slideshare.

    Leveraging Heritrix and the Wayback Machine on a Corporate Intranet: A Case Study on Improving Corporate Archives

    In this work, we present a case study in which we investigate using open-source, web-scale web archiving tools (i.e., Heritrix and the Wayback Machine installed on the MITRE Intranet) to automatically archive a corporate Intranet. We use this case study to outline the challenges of Intranet web archiving, identify situations in which the open source tools are not well suited for the needs of the corporate archivists, and make recommendations for future corporate archivists wishing to use such tools. We performed a crawl of 143,268 URIs (125 GB and 25 hours) to demonstrate that the crawlers are easy to set up, efficiently crawl the Intranet, and improve archive management. However, challenges exist when the Intranet contains sensitive information, areas with potential archival value require user credentials, or archival targets make extensive use of internally developed and customized web services. We elaborate on and recommend approaches for overcoming these challenges. [ABSTRACT FROM AUTHOR

    Recognizing Co-Creators in Four Configurations: Critical Questions for Web Archiving

    Four categories of co-creator shape web archivists\u27 practice and influence the development of web archives: social forces, users and uses, subjects of web archives, and technical agents. This paper illustrates how these categories of co-creator overlap and interact in four specific web archiving contexts. It recommends that web archivists acknowledge this complex array of contributors as a way to imagine web archives differently. A critical approach to web archiving recognizes relationships and blended roles among stakeholders; seeks opportunities for non-extractive archival activity; and acknowledges the value of creative reuse as an important aspect of preservation

    Políticas E Tecnologias De Preservação Digital No Arquivamento Da Web

    The objective of this paper was to analyze digital preservation from the web archiving approach, addressing the technologies involved in the archiving process, as well as policies for the selection, preservation and availability of these contents, as well as the study of international institutions that work on preservation of the web. The methodology uses bibliographic and documentary research on international archival web initiatives and aims to foment the discussion in Brazil, as well as to serve as a subsidy for applied studies. It analyzes the scientific publications based on Scopus journals of the last five years (2012-2016) that deal with web archiving, web content selection policies and technologies applied to the harvest, storage and access to archiv ed website s. It also provides an overview of the technologies used by the community of web archiving initiatives, based on the identification of the data available on the web site of the International Internet Preservation Consortium. It concludes that countries that do not yet have their own initiatives, such as Brazil, with the establishment of selection policies with specific approaches (institutional, thematic, domain, etc.), as well as web archive adoption of open source technologies can not only preserve your digital memory but also contribute to the international web archiving community. Digital preservation; Preservation policy; W eb archiving.El objetivo del artículo fue analizar la preservación digital a partir del abordaje del archivamiento de la web, desde las tecnologías involucradas en el proceso de archivo, así como políticas de selección, preservación y puesta a disposición de estos con tenidos, además del estudio de instituciones internacionales que actúan en la preservación de la información de la web . La metodología utiliza fue la investigación bibliográfica y documental sobre iniciativas internacionales de archivado de la web y objetiva fomentar la discusión en Brasil, así como servir de subsidio para estudios aplicados. Analiza las publicaciones científicas de la base de datos Scopus en los últimos cinco años (2012-2016) que versan sobre el archivamiento de la web, políticas de selección de los contenidos de la web y tecnologías aplicadas a la recolección, almacenamiento y acceso a los sitios web archivados. También trae un panorama de las tecnologías utilizadas por la comunidad que participa de las iniciativas de archivamiento de la web, a partir de la identificación de los datos disponibles en el sitio del Consorcio Internacional de Preservación de Internet. Concluye que países que aún no tienen iniciativas propias, como Brasil, con el establecimiento de políticas de selección con enfoques específicos (institucionales, temáticos, por dominio, etc.), así como una gestión del ciclo de vida del archivo de la web y la adopción de tecnologías en el formato de código abierto (open source) pueden no sólo preservar su memoria digital, sino tam bién contribuir con la comunidad internacional de archivamiento de la web.O objetivo do artigo foi analisar a preservação digital a partir da abordagem de arquivamento da web, desde as tecnologias envolvidas no processo de arquivamento, bem como políticas de seleção, preservação e disponibilização destes conteúdos, além do estudo de instituições internacionais que atuam na preservação da web. A metodologia utiliza pesquisa bibliográfica e documental sobre iniciativas internacionais de arquivamento da web e objetiva fomentar a discussão no Brasil, assim como servir de subsídio para estudos aplicados. Analisa as publicações científicas na base de periódicos Scopus dos últimos cinco anos (2012-2016) que versam sobre o arquivamento da web, políticas de seleção dos conteúdos web e tecnologias aplicadas à coleta, armazenamento e acesso aos websites arquivados. Traz também um panorama das tecnologias utilizadas pela comunidade de iniciativas de arquivamento da web, a partir da identificação dos dados disponibilizados no site do Consórcio Internacional de Preservação da Internet. Conclui que países que ainda não possuem iniciativas próprias, como o Brasil, com o estabelecimento de políticas de seleção com enfoques específicos (institucionais, temáticas, por domínio, etc.), assim como uma gestão do ciclo de vida do arquivamento da web e a adoção de tecnologias no formato código aberto (open source) podem não só preservar sua memória digital, mas também contribuir com a comunidade internacional de arquivamento da web

    Aggregating Private and Public Web Archives Using the Mementity Framework

    Web archives preserve the live Web for posterity, but the content on the Web one cares about may not be preserved. The ability to access this content in the future requires the assurance that those sites will continue to exist on the Web until the content is requested and that the content will remain accessible. It is ultimately the responsibility of the individual to preserve this content, but attempting to replay personally preserved pages segregates archived pages by individuals and organizations of personal, private, and public Web content. This is misrepresentative of the Web as it was. While the Memento Framework may be used for inter-archive aggregation, no dynamics exist for the special consideration needed for the contents of these personal and private captures. In this work we introduce a framework for aggregating private and public Web archives. We introduce three mementities that serve the roles of the aforementioned aggregation, access control to personal Web archives, and negotiation of Web archives in dimensions beyond time, inclusive of the dimension of privacy. These three mementities serve as the foundation of the Mementity Framework. We investigate the difficulties and dynamics of preserving, replaying, aggregating, propagating, and collaborating with live Web captures of personal and private content. We offer a systematic solution to these outstanding issues through the application of the framework. We ensure the framework\u27s applicability beyond the use cases we describe as well as the extensibility of reusing the mementities for currently unforeseen access patterns. We evaluate the framework by justifying the mementity design decisions, formulaically abstracting the anticipated temporal and spatial costs, and providing reference implementations, usage, and examples for the framework

    Sistema Informático para el proceso de Customer Engagement en SOA Professionals

    Esta investigación involucró el desarrollo de un sistema informático como reemplazo de otro ya existente para mejorar el proceso de negocio de Customer Engagement de la empresa SOA Professionals. La investigación realizada es de tipo aplicada, y a la vez el diseño del proyecto es pre experimental. Este informe tuvo como propósito principal especificar el impacto del nuevo sistema informático para el proceso llamado Customer Engagement en la empresa SOA Professionals. Además, en cuanto la metodología de desarrollo de software se utilizó el framework Scrum, pues se determinó que este marco de trabajo ágil permitiría una forma más acelerada para la comunicación periódica entre los miembros de los equipos involucrados y se adapta mejor a los cambios constantes. La arquitectura de software está basada en microservicios y la mayor parte del desarrollo ha sido realizado con los lenguajes de programación Javascript y Node.js. Las Base de Datos usadas son Amazon DynamoDB y PostgreSQL. Según los resultados obtenidos mediante el análisis estadístico, se determinó que la implementación de la nueva solución de software desarrollada ayuda a que el proceso de Customer Engagement de SOA Professionals alcance mejores indicadores de acuerdo a los objetivos planteados