An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication

Abstract

The key factors for the success of the World Wide Web are its large size and the lack of a centralized control over its contents. In recent years, many advances have been made in preserving web content but much of this content (namely, social media content) was not archived, or still to this day is not being archived,for various reasons. Tools built to accomplish this frequently break because of the dynamic structure of social media websites. Because many social media websites exhibit a commonality in hierarchy of the content, it would be worthwhile to setup a means to reference this hierarchy for tools to leverage and become adaptive as the target websites evolve. As relying on the service to provide this means is problematic in the context of archiving, we can surmise that the only way to assure that all of these shortcomings are not experienced is to rely on the original context in which the user views the content, i.e. the webbrowser. In this thesis I will describe an abstract specification and concrete implementations of the specification that allow tools to leverage the context of theweb browser to capture content into personal web archives. These tools will then be able to accomplish personal web archiving in a way that makes them more robust. As evaluation, I will make a change in the hierarchy of a synthetic social media website and its respective specification. Then, I will show that anadapted tool, using the specification, continues to function and is able to archive the social media website

    Similar works