915 research outputs found

    Invisible Pixels Are Dead, Long Live Invisible Pixels!

    Full text link
    Privacy has deteriorated in the world wide web ever since the 1990s. The tracking of browsing habits by different third-parties has been at the center of this deterioration. Web cookies and so-called web beacons have been the classical ways to implement third-party tracking. Due to the introduction of more sophisticated technical tracking solutions and other fundamental transformations, the use of classical image-based web beacons might be expected to have lost their appeal. According to a sample of over thirty thousand images collected from popular websites, this paper shows that such an assumption is a fallacy: classical 1 x 1 images are still commonly used for third-party tracking in the contemporary world wide web. While it seems that ad-blockers are unable to fully block these classical image-based tracking beacons, the paper further demonstrates that even limited information can be used to accurately classify the third-party 1 x 1 images from other images. An average classification accuracy of 0.956 is reached in the empirical experiment. With these results the paper contributes to the ongoing attempts to better understand the lack of privacy in the world wide web, and the means by which the situation might be eventually improved.Comment: Forthcoming in the 17th Workshop on Privacy in the Electronic Society (WPES 2018), Toronto, AC

    Data-Driven Reporting and Processing of Digital Archives with Brunnhilde

    Get PDF
    [Excerpt] Archivists are now several decades in to appraising, arranging, describing, preserving, and providing access to digital archives and have developed and adopted a number of tools to aid in specific tasks along the way. This article discusses Brunnhilde, a new tool developed to address one of the first steps in working with born-digital materials: characterizing the overall contents of directories or disks to enable smart evidence-based decision-making in the appraisal, arrangement, and description processes

    Summary of IMLS NLG Collections

    Get PDF
    The creation of a collection registry for digital collections developed with funding from the IMLS National Leadership Grant (NLG) program from inception to date has provided an opportunity to observe commonalities and differences among and between these collections. Initial analyses of collection characteristics and the different approaches taken by NLG projects to collection definition inform us regarding current practice and have suggested avenues for fruitful research.IMLS National Leadership Grant LG-02-02-0281unpublishednot peer reviewe

    Dynamic Web File Format Transformations with Grace

    Full text link
    Web accessible content stored in obscure, unpopular or obsolete formats represents a significant problem for digital preservation. The file formats that encode web content represent the implicit and explicit choices of web site maintainers at a particular point in time. Older file formats that have fallen out of favor are obviously a problem, but so are new file formats that have not yet been fully supported by browsers. Often browsers use plug-in software for displaying old and new formats, but plug-ins can be difficult to find, install and replicate across all environments that one may use. We introduce Grace, an http proxy server that transparently converts browser-incompatible and obsolete web content into web content that a browser is able to display without the use of plug-ins. Grace is configurable on a per user basis and can be expanded to provide an array of conversion services. We illustrate how the Grace prototype transforms several image formats (XBM, PNG with various alpha channels, and JPEG 2000) so they are viewable in Internet Explorer.Comment: 12 pages, 9 figure

    ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

    Full text link
    Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.Comment: JCDL 2016, Newark, NJ, US
    corecore