6,925 research outputs found

    Tools for Discovering and Archiving the Mobile Web

    Get PDF
    Many websites are adapting their content for users who are accessing the Web using smartphones and tablets. The growth of this Mobile Web has required web archivists to change their practices in order to collect this ephemeral web content. We have created a tool called MobileFinder which can be used to automatically detect mobile pages when given the URL of a desktop web page. We used this tool in an experiment to gauge what techniques popular websites are currently using to expose mobile content, and we incorporated the tool into Heritrix to demonstrate its usefulness to the web archiving community

    Information scraps: how and why information eludes our personal information management tools

    No full text
    In this paper we describe information scraps -- a class of personal information whose content is scribbled on Post-it notes, scrawled on corners of random sheets of paper, buried inside the bodies of e-mail messages sent to ourselves, or typed haphazardly into text files. Information scraps hold our great ideas, sketches, notes, reminders, driving directions, and even our poetry. We define information scraps to be the body of personal information that is held outside of its natural or We have much still to learn about these loose forms of information capture. Why are they so often held outside of our traditional PIM locations and instead on Post-its or in text files? Why must we sometimes go around our traditional PIM applications to hold on to our scraps, such as by e-mailing ourselves? What are information scraps' role in the larger space of personal information management, and what do they uniquely offer that we find so appealing? If these unorganized bits truly indicate the failure of our PIM tools, how might we begin to build better tools? We have pursued these questions by undertaking a study of 27 knowledge workers. In our findings we describe information scraps from several angles: their content, their location, and the factors that lead to their use, which we identify as ease of capture, flexibility of content and organization, and avilability at the time of need. We also consider the personal emotive responses around scrap management. We present a set of design considerations that we have derived from the analysis of our study results. We present our work on an application platform, jourknow, to test some of these design and usability findings

    Library Resources: Procurement, Innovation and Exploitation in a Digital World

    Get PDF
    The possibilities of the digital future require new models for procurement, innovation and exploitation. Emma Crowley and Chris Spencer describe the skills staff need to deliver resources in hybrid and digital environments. The chapter demonstrates the innovative ways that librarians use to procure and exploit the wealth of resources available in a digital world. They also describe the technological developments that can be adopted to improve workflow processes and they highlight the challenges faced on this fascinating journey

    Scripts in a Frame: A Framework for Archiving Deferred Representations

    Get PDF
    Web archives provide a view of the Web as seen by Web crawlers. Because of rapid advancements and adoption of client-side technologies like JavaScript and Ajax, coupled with the inability of crawlers to execute these technologies effectively, Web resources become harder to archive as they become more interactive. At Web scale, we cannot capture client-side representations using the current state-of-the art toolsets because of the migration from Web pages to Web applications. Web applications increasingly rely on JavaScript and other client-side programming languages to load embedded resources and change client-side state. We demonstrate that Web crawlers and other automatic archival tools are unable to archive the resulting JavaScript-dependent representations (what we term deferred representations), resulting in missing or incorrect content in the archives and the general inability to replay the archived resource as it existed at the time of capture. Building on prior studies on Web archiving, client-side monitoring of events and embedded resources, and studies of the Web, we establish an understanding of the trends contributing to the increasing unarchivability of deferred representations. We show that JavaScript leads to lower-quality mementos (archived Web resources) due to the archival difficulties it introduces. We measure the historical impact of JavaScript on mementos, demonstrating that the increased adoption of JavaScript and Ajax correlates with the increase in missing embedded resources. To measure memento and archive quality, we propose and evaluate a metric to assess memento quality closer to Web users’ perception. We propose a two-tiered crawling approach that enables crawlers to capture embedded resources dependent upon JavaScript. Measuring the performance benefits between crawl approaches, we propose a classification method that mitigates the performance impacts of the two-tiered crawling approach, and we measure the frontier size improvements observed with the two-tiered approach. Using the two-tiered crawling approach, we measure the number of client-side states associated with each URI-R and propose a mechanism for storing the mementos of deferred representations. In short, this dissertation details a body of work that explores the following: why JavaScript and deferred representations are difficult to archive (establishing the term deferred representation to describe JavaScript dependent representations); the extent to which JavaScript impacts archivability along with its impact on current archival tools; a metric for measuring the quality of mementos, which we use to describe the impact of JavaScript on archival quality; the performance trade-offs between traditional archival tools and technologies that better archive JavaScript; and a two-tiered crawling approach for discovering and archiving currently unarchivable descendants (representations generated by client-side user events) of deferred representations to mitigate the impact of JavaScript on our archives. In summary, what we archive is increasingly different from what we as interactive users experience. Using the approaches detailed in this dissertation, archives can create mementos closer to what users experience rather than archiving the crawlers’ experiences on the Web

    Archiving Interactive Narratives at the British Library

    Get PDF
    This paper describes the creation of the Interactive Narratives collection in the UK Web Archive, as part of the UK Legal Deposit Libraries Emerging Formats Project. The aim of the project is to identify, collect and preserve complex digital publications that are in scope for collection under UK Non-Print Legal Deposit Regulations. This article traces the process of building the Interactive Narratives collection, analysing the different tools and methods used and placing the collection within the wider context of Emerging Formats work and engagement activities at the British Library

    Using Technology Enabled Qualitative Research to Develop Products for the Social Good, An Overview

    Get PDF
    This paper discusses the potential benefits of the convergence of three recent trends for the design of socially beneficial products and services: the increasing application of qualitative research techniques in a wide range of disciplines, the rapid mainstreaming of social media and mobile technologies, and the emergence of software as a service. Presented is a scenario facilitating the complex data collection, analysis, storage, and reporting required for the qualitative research recommended for the task of designing relevant solutions to address needs of the underserved. A pilot study is used as a basis for describing the infrastructure and services required to realize this scenario. Implications for innovation of enhanced forms of qualitative research are presented

    Easy on that trigger dad: a study of long term family photo retrieval

    Get PDF
    We examine the effects of new technologies for digital photography on people's longer term storage and access to collections of personal photos. We report an empirical study of parents' ability to retrieve photos related to salient family events from more than a year ago. Performance was relatively poor with people failing to find almost 40% of pictures. We analyze participants' organizational and access strategies to identify reasons for this poor performance. Possible reasons for retrieval failure include: storing too many pictures, rudimentary organization, use of multiple storage systems, failure to maintain collections and participants' false beliefs about their ability to access photos. We conclude by exploring the technical and theoretical implications of these findings

    A Systematic Approach Towards Web Preservation

    Get PDF
    The main purpose of the article is to divide the web preservation process into small explicable stages and design a step-by-step web preservation process that leads to creating a well-organized web archive. A number of research articles are studied about web preservation projects and web archives, and designed a step-by-step systematic approach for web preservation. The proposed comprehensive web preservation process describes and combines strengths of different techniques observed during the study for preserving digital web contents into a digital web archive. For each web preservation step, different approaches and possible implementation techniques have been identified that can be adopted in digital archiving. The potential value of the proposed model is to guide the archivist, related personnel, and organizations to effectively preserved their intellectual digital contents for future use. Moreover, the model can help to initiate a web preservation process and create a well-organized web archive to efficiently manage the archived web contents. A section briefly describes the implementation of the proposed approach in a digital news stories preservation framework for archiving news published online from different sources
    corecore