12,680 research outputs found
Agents, Bookmarks and Clicks: A topical model of Web traffic
Analysis of aggregate and individual Web traffic has shown that PageRank is a
poor model of how people navigate the Web. Using the empirical traffic patterns
generated by a thousand users, we characterize several properties of Web
traffic that cannot be reproduced by Markovian models. We examine both
aggregate statistics capturing collective behavior, such as page and link
traffic, and individual statistics, such as entropy and session size. No model
currently explains all of these empirical observations simultaneously. We show
that all of these traffic patterns can be explained by an agent-based model
that takes into account several realistic browsing behaviors. First, agents
maintain individual lists of bookmarks (a non-Markovian memory mechanism) that
are used as teleportation targets. Second, agents can retreat along visited
links, a branching mechanism that also allows us to reproduce behaviors such as
the use of a back button and tabbed browsing. Finally, agents are sustained by
visiting novel pages of topical interest, with adjacent pages being more
topically related to each other than distant ones. This modulates the
probability that an agent continues to browse or starts a new session, allowing
us to recreate heterogeneous session lengths. The resulting model is capable of
reproducing the collective and individual behaviors we observe in the empirical
data, reconciling the narrowly focused browsing patterns of individual users
with the extreme heterogeneity of aggregate traffic measurements. This result
allows us to identify a few salient features that are necessary and sufficient
to interpret the browsing patterns observed in our data. In addition to the
descriptive and explanatory power of such a model, our results may lead the way
to more sophisticated, realistic, and effective ranking and crawling
algorithms.Comment: 10 pages, 16 figures, 1 table - Long version of paper to appear in
Proceedings of the 21th ACM conference on Hypertext and Hypermedi
- …