Search CORE

915 research outputs found

Invisible Pixels Are Dead, Long Live Invisible Pixels!

Author: Buchanan W. J.
Gamalielsson J.
Lerner A.
Schwenk J.
Vastel A.
West S. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/08/2018
Field of study

Privacy has deteriorated in the world wide web ever since the 1990s. The tracking of browsing habits by different third-parties has been at the center of this deterioration. Web cookies and so-called web beacons have been the classical ways to implement third-party tracking. Due to the introduction of more sophisticated technical tracking solutions and other fundamental transformations, the use of classical image-based web beacons might be expected to have lost their appeal. According to a sample of over thirty thousand images collected from popular websites, this paper shows that such an assumption is a fallacy: classical 1 x 1 images are still commonly used for third-party tracking in the contemporary world wide web. While it seems that ad-blockers are unable to fully block these classical image-based tracking beacons, the paper further demonstrates that even limited information can be used to accurately classify the third-party 1 x 1 images from other images. An average classification accuracy of 0.956 is reached in the empirical experiment. With these results the paper contributes to the ongoing attempts to better understand the lack of privacy in the world wide web, and the means by which the situation might be eventually improved.Comment: Forthcoming in the 17th Workshop on Privacy in the Electronic Society (WPES 2018), Toronto, AC

arXiv.org e-Print Archive

Crossref

Data-Driven Reporting and Processing of Digital Archives with Brunnhilde

Author: Walsh Tim
Publication venue: DigitalCommons@ILR
Publication date: 01/07/2017
Field of study

[Excerpt] Archivists are now several decades in to appraising, arranging, describing, preserving, and providing access to digital archives and have developed and adopted a number of tools to aid in specific tasks along the way. This article discusses Brunnhilde, a new tool developed to address one of the first steps in working with born-digital materials: characterizing the overall contents of directories or disks to enable smart evidence-based decision-making in the appraisal, arrangement, and description processes

DigitalCommons@ILR

eCommons@Cornell

Summary of IMLS NLG Collections

Author: Benevento Jenny
Publication venue
Publication date: 22/09/2005
Field of study

The creation of a collection registry for digital collections developed with funding from the IMLS National Leadership Grant (NLG) program from inception to date has provided an opportunity to observe commonalities and differences among and between these collections. Initial analyses of collection characteristics and the different approaches taken by NLG projects to collection definition inform us regarding current practice and have suggested avenues for fruitful research.IMLS National Leadership Grant LG-02-02-0281unpublishednot peer reviewe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Dynamic Web File Format Transformations with Grace

Author: McCown Frank
Nelson Michael L.
Swaney Daniel S.
Publication venue
Publication date: 01/01/2005
Field of study

Web accessible content stored in obscure, unpopular or obsolete formats represents a significant problem for digital preservation. The file formats that encode web content represent the implicit and explicit choices of web site maintainers at a particular point in time. Older file formats that have fallen out of favor are obviously a problem, but so are new file formats that have not yet been fully supported by browsers. Often browsers use plug-in software for displaying old and new formats, but plug-ins can be difficult to find, install and replicate across all environments that one may use. We introduce Grace, an http proxy server that transparently converts browser-incompatible and obsolete web content into web content that a browser is able to display without the use of plug-ins. Grace is configurable on a per user basis and can be expanded to provide an array of conversion services. We illustrate how the Grace prototype transforms several image formats (XBM, PNG with various alpha channels, and JPEG 2000) so they are viewable in Internet Explorer.Comment: 12 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

Author: AlSum Ahmed
Brügger Niels
Gomes Daniel
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/02/2017
Field of study

Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.Comment: JCDL 2016, Newark, NJ, US

arXiv.org e-Print Archive

Crossref

SERIF: A Semantic ExeRcise Interchange Format

Author: De Meester Ben
De Nies Tom
Ghaem Sigarchian Hajar
Mannens Erik
Salliau Frank
Van de Walle Rik
Verborgh Ruben
Publication venue
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography