research

ArchivePress: A Really Simple Solution to Archiving Blog Content

Abstract

ArchivePress is a new technical solution for collecting and archiving content from blogs. Current solutions are commonly based on typical web archiving activities, whereby a crawler is configured to harvest a copy of the blog and return the copy to a web archive. This approach is perfectly acceptable if the requirement is that the site is presented as an integral whole. However, ArchivePress is based upon the premise that blogs are a distinct class of web-based resource, in which the post, not the page, is atomic, and certain properties, such as layouts and colours, are demonstrably superfluous for many (if not most) users. As a result, an approach that builds on the functionality provided by web feeds to capture only selected aspects of the blog offers more potential. This is particularly the case when institutions wish to develop collections of aggregated blog content from a range of different sources. The presentation will describe our research to develop such an approach, including work to define the significant properties of blogs, details of the technical development, and pilot collections against which the tool has been tested

    Similar works