915 research outputs found
Invisible Pixels Are Dead, Long Live Invisible Pixels!
Privacy has deteriorated in the world wide web ever since the 1990s. The
tracking of browsing habits by different third-parties has been at the center
of this deterioration. Web cookies and so-called web beacons have been the
classical ways to implement third-party tracking. Due to the introduction of
more sophisticated technical tracking solutions and other fundamental
transformations, the use of classical image-based web beacons might be expected
to have lost their appeal. According to a sample of over thirty thousand images
collected from popular websites, this paper shows that such an assumption is a
fallacy: classical 1 x 1 images are still commonly used for third-party
tracking in the contemporary world wide web. While it seems that ad-blockers
are unable to fully block these classical image-based tracking beacons, the
paper further demonstrates that even limited information can be used to
accurately classify the third-party 1 x 1 images from other images. An average
classification accuracy of 0.956 is reached in the empirical experiment. With
these results the paper contributes to the ongoing attempts to better
understand the lack of privacy in the world wide web, and the means by which
the situation might be eventually improved.Comment: Forthcoming in the 17th Workshop on Privacy in the Electronic Society
(WPES 2018), Toronto, AC
Data-Driven Reporting and Processing of Digital Archives with Brunnhilde
[Excerpt] Archivists are now several decades in to appraising, arranging, describing, preserving, and providing access to digital archives and have developed and adopted a number of tools to aid in specific tasks along the way. This article discusses Brunnhilde, a new tool developed to address one of the first steps in working with born-digital materials: characterizing the overall contents of directories or disks to enable smart evidence-based decision-making in the appraisal, arrangement, and description processes
Summary of IMLS NLG Collections
The creation of a collection registry for digital collections developed with funding from the IMLS National Leadership Grant (NLG) program from inception to date has provided an opportunity to observe commonalities and differences among and between these collections. Initial analyses of collection characteristics and the different approaches taken by NLG projects to collection definition inform us regarding current practice and have suggested avenues for fruitful research.IMLS National Leadership Grant LG-02-02-0281unpublishednot peer reviewe
Dynamic Web File Format Transformations with Grace
Web accessible content stored in obscure, unpopular or obsolete formats
represents a significant problem for digital preservation. The file formats
that encode web content represent the implicit and explicit choices of web site
maintainers at a particular point in time. Older file formats that have fallen
out of favor are obviously a problem, but so are new file formats that have not
yet been fully supported by browsers. Often browsers use plug-in software for
displaying old and new formats, but plug-ins can be difficult to find, install
and replicate across all environments that one may use. We introduce Grace, an
http proxy server that transparently converts browser-incompatible and obsolete
web content into web content that a browser is able to display without the use
of plug-ins. Grace is configurable on a per user basis and can be expanded to
provide an array of conversion services. We illustrate how the Grace prototype
transforms several image formats (XBM, PNG with various alpha channels, and
JPEG 2000) so they are viewable in Internet Explorer.Comment: 12 pages, 9 figure
ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation
Web archives are a valuable resource for researchers of various disciplines.
However, to use them as a scholarly source, researchers require a tool that
provides efficient access to Web archive data for extraction and derivation of
smaller datasets. Besides efficient access we identify five other objectives
based on practical researcher needs such as ease of use, extensibility and
reusability.
Towards these objectives we propose ArchiveSpark, a framework for efficient,
distributed Web archive processing that builds a research corpus by working on
existing and standardized data formats commonly held by Web archiving
institutions. Performance optimizations in ArchiveSpark, facilitated by the use
of a widely available metadata index, result in significant speed-ups of data
processing. Our benchmarks show that ArchiveSpark is faster than alternative
approaches without depending on any additional data stores while improving
usability by seamlessly integrating queries and derivations with external
tools.Comment: JCDL 2016, Newark, NJ, US
- …