4,179 research outputs found

    Impact of URI Canonicalization on Memento Count

    Get PDF
    Quantifying the captures of a URI over time is useful for researchers to identify the extent to which a Web page has been archived. Memento TimeMaps provide a format to list mementos (URI-Ms) for captures along with brief metadata, like Memento-Datetime, for each URI-M. However, when some URI-Ms are dereferenced, they simply provide a redirect to a different URI-M (instead of a unique representation at the datetime), often also present in the TimeMap. This infers that confidently obtaining an accurate count quantifying the number of non-forwarding captures for a URI-R is not possible using a TimeMap alone and that the magnitude of a TimeMap is not equivalent to the number of representations it identifies. In this work we discuss this particular phenomena in depth. We also perform a breakdown of the dynamics of counting mementos for a particular URI-R (google.com) and quantify the prevalence of the various canonicalization patterns that exacerbate attempts at counting using only a TimeMap. For google.com we found that 84.9% of the URI-Ms result in an HTTP redirect when dereferenced. We expand on and apply this metric to TimeMaps for seven other URI-Rs of large Web sites and thirteen academic institutions. Using a ratio metric DI for the number of URI-Ms without redirects to those requiring a redirect when dereferenced, five of the eight large web sites' and two of the thirteen academic institutions' TimeMaps had a ratio of ratio less than one, indicating that more than half of the URI-Ms in these TimeMaps result in redirects when dereferenced.Comment: 43 pages, 8 figure

    WARCreate - Create Wayback-Consumable WARC Files From Any Webpage

    Get PDF
    [First Slide] What is WARCreate? Google Chrome extension Creates WARC files Enables preservation by users from their browser First steps in bringing Institutional Archiving facilities to the P

    WARCreate and WAIL: WARC, Wayback, and Heritrix Made Easy

    Get PDF
    [First slide] The Problem Institutional Tools, Personal Archivists ON YOUR MACHINE -Complex to Operate -Require Infrastructure DELEGATED TO INSTITUTIONS -$ -Lose original perspective Locale content tailoring (DC vs. San Francisco) Observation Medium (PC web browser vs. Crawler

    Client-Assisted Memento Aggregation Using The Prefer Header

    Get PDF
    [First paragraph] Preservation of the Web ensures that future generations have a picture of how the web was. Web archives like Internet Archive\u27s Wayback Machine, WebCite, and archive.is allow individuals to submit URIs to be archived, but the captures they preserve then reside at the archives. Traversing these captures in time as preserved by multiple archive sources (using Memento [8]) provides a more comprehensive picture of the past Web than relying on a single archive. Some content on the Web, such as content behind authentication, may be unsuitable or inaccessible for preservation by these organizations. Furthermore, this content may be inappropriate for the organizations to preserve due to reasons of privacy or exposure of personally identifiable information [4]. However, preserving this content would ensure an even-more comprehensive picture of the web and may be useful for future historians who wish to analyze content beyond the capability or suitability of archives created to preserve the public Web

    Avoiding Zombies in Archival Replay Using ServiceWorker

    Get PDF
    [First paragraph] A Composite Memento is an archived representation of a web page with all the page requisites such as images and stylesheets. All embedded resources have their own URIs, hence, they are archived independently. For a meaningful archival replay, it is important to load all the page requisites from the archive within the temporal neighborhood of the base HTML page. To achieve this goal, archival replay systems try to rewrite all the resource references to appropriate archived versions before serving HTML, CSS, or JS. However, an effective server-side URL rewriting is difficult when URLs are generated dynamically using JavaScript. A failure of correct URL rewriting might yield an invalid/unintended URI or resolve to a live resource. Such live resources, leaking into a composite memento, are called zombies

    A Survey of Archival Replay Banners

    Get PDF
    We surveyed various archival systems to compare and contrast different techniques used to implement an archival replay banner. We found that inline plain HTML injection is the most common approach, but prone to style conflicts. Iframe-based banners are also very common and while they do not have style conflicts, they suffer from screen real estate wastage and limited design choices. Custom Elements-based banners are promising, but due to being a new web standard, these are not yet widely deployed

    Unobtrusive and Extensible Archival Replay Banners Using Custom Elements

    Get PDF
    We compare and contrast three different ways to implement an archival replay banner. We propose an implementation that utilizes Custom Elements and adds some unique behaviors, not common in existing archival replay systems, to enhance the user experience. Our approach has a minimal user interface footprint and resource overhead while still providing rich interactivity and extended on-demand provenance information about the archived resources

    Why do patients decline surgical trials? Findings from a qualitative interview study embedded in the Cancer Research UK BOLERO trial (Bladder cancer: Open versus Lapararoscopic or RObotic cystectomy)

    Get PDF
    Background Surgical trials have typically experienced recruitment difficulties when compared with other types of oncology trials. Qualitative studies have an important role to play in exploring reasons for low recruitment, although to date few such studies have been carried out that are embedded in surgical trials. The BOLERO trial (Bladder cancer: Open versus Lapararoscopic or RObotic cystectomy) is a study to determine the feasibility of randomisation to open versus laparoscopic access/robotic cystectomy in patients with bladder cancer. We describe the results of a qualitative study embedded within the clinical trial that explored why patients decline randomisation. Methods Ten semi-structured interviews with patients who declined randomisation to the clinical trial, and two interviews with recruiting research nurses were conducted. Data were analysed for key themes. Results The majority of patients declined the trial because they had preferences for a particular treatment arm, and in usual practice could choose which surgical method they would be given. In most cases the robotic option was preferred. Patients described an intuitive ‘sense’ that favoured the new technology and had carried out their own inquiries, including Internet research and talking with previous patients and friends and family with medical backgrounds. Medical histories and lifestyle considerations also shaped these personalised choices. Of importance too, however, were the messages patients perceived from their clinical encounters. Whilst some patients felt their surgeon favoured the robotic option, others interpreted ‘indirect’ cues such as the ‘established’ reputation of the surgeon and surgical method and comments made during clinical assessments. Many patients expressed a wish for greater direction from their surgeon when making these decisions. Conclusion For trials where the ‘new technology’ is available to patients, there will likely be difficulties with recruitment. Greater attention could be paid to how messages about treatment options and the trial are conveyed across the whole clinical setting. However, if it is too difficult to challenge such messages, then questions should be asked about whether genuine and convincing equipoise can be presented and perceived in such trials. This calls for consideration of whether alternative methods of generating evidence could be used when evaluating surgical techniques which are established and routinely available

    A Method for Identifying Personalized Representations in Web Archives

    Get PDF
    Web resources are becoming increasingly personalized — two different users clicking on the same link at the same time can see content customized for each individual user. These changes result in multiple representations of a resource that cannot be canonicalized in Web archives. We identify characteristics of this problem by presenting a potential solution to generalize personalized representations in archives. We also present our proof-of-concept prototype that analyzes WARC (Web ARChive) format files, inserts metadata establishing relationships, and provides archive users the ability to navigate on the additional dimension of environment variables in a modified Wayback Machine

    A Review of the Genomic Landscape of early cutaneous Squamous Cell Carcinoma

    Get PDF
    https://openworks.mdanderson.org/sumexp23/1048/thumbnail.jp
    • …
    corecore