4 research outputs found

    Analyzing Large Lists of URLs by Visualization using Hilbert curves

    No full text
    Search engines like Google provide an aggregation mechanism for the web and constitute the main access point to the Internet for a large portion of their users. For this reason, biases and personalization schemes of search results may have huge societal implications that require scientific inquiry and monitoring. The data obtained from such an inquiry might come in the form of a corpus-like data set containing a large collection of URLs, each of which is associated with a user, a date, a search keyword, and a rank in a particular user’s result list. This work is dedicated to the visualization task of providing insights into the structure of such a data set as well as the observation of developments over time. The visualization of unstructured data is an innately challenging task. We argue that the aforementioned data structure is very akin to text corpora, but possesses some distinct characteristics that require visualization methods beyond the current repertoire of corpus visualization techniques. The key differences between URLs and other textual data are their lack of internal cohesion, their relatively short lengths, and—most importantly—their semi-structured nature that is attributable to their standardized constituents (protocol, top-level domain, country domain, etc.). We present a novel technique to spatially represent such data while retaining comparability over time: A corpus of URLs in alphabetical order is evenly distributed onto the so-called Hilbert curve, a space-filling curve which can be used to map one-dimensional spaces into higher dimensions. Rank and other associated meta-data can then be mapped to other visualization primitives. We demonstrate the viability of this technique by applying it to a data set of Google search result lists of the form introduced above. The data retains much of its spatial structure (i.e., closeness between similar URLs) and the spatial stability of the Hilbert curve enables comparisons over time. To make our technique accessible to other practitioners in the field, we implemented it in the R programming language, building on the implementation of the grammar of graphics provided by the package ggplot2
    corecore