Understanding how people interact with the web is key for a variety of
applications, e.g., from the design of effective web pages to the definition of
successful online marketing campaigns. Browsing behavior has been traditionally
represented and studied by means of clickstreams, i.e., graphs whose vertices
are web pages, and edges are the paths followed by users. Obtaining large and
representative data to extract clickstreams is however challenging. The
evolution of the web questions whether browsing behavior is changing and, by
consequence, whether properties of clickstreams are changing. This paper
presents a longitudinal study of clickstreams in from 2013 to 2016. We evaluate
an anonymized dataset of HTTP traces captured in a large ISP, where thousands
of households are connected. We first propose a methodology to identify actual
URLs requested by users from the massive set of requests automatically fired by
browsers when rendering web pages. Then, we characterize web usage patterns and
clickstreams, taking into account both the temporal evolution and the impact of
the device used to explore the web. Our analyses precisely quantify various
aspects of clickstreams and uncover interesting patterns, such as the typical
short paths followed by people while navigating the web, the fast increasing
trend in browsing from mobile devices and the different roles of search engines
and social networks in promoting content. Finally, we contribute a dataset of
anonymized clickstreams to the community to foster new studies (anonymized
clickstreams are available to the public at
http://bigdata.polito.it/clickstream).Comment: 30 page