Skip to main content
Article thumbnail
Location of Repository

Computing the entropy of user navigation in the web

By Mark Levene and George Loizou

Abstract

Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be useful in providing navigation assistance. Herein, we give a theoretical underpinning of user navigation in terms of the entropy of an underlying Markov chain modelling the web topology. We present a novel method for online incremental computation of the entropy and a large deviation result regarding the length of a trail to realize the said entropy. We provide an error analysis for our estimation of the entropy in terms of the divergence between the empirical and actual probabilities. We then indicate applications of our algorithm in the area of web data mining. Finally, we present an extension of our technique to higher-order Markov chains by a suitable reduction of a higher-order Markov chain model to a first-order one

Topics: csis
Publisher: World Scientific Publishing Company
Year: 2003
OAI identifier: oai:eprints.bbk.ac.uk.oai2:212

Suggested articles

Citations

  1. (1998). A Cherno® bound for random walks on expander graphs. doi
  2. (1998). A Chernoff bound for random walks on expander graphs. doi
  3. (1989). A note on the DMC data compression scheme. doi
  4. (1999). A probabilistic approach to navigation in hypertext. doi
  5. (1968). An Introduction to Probability Theory and its Applications. doi
  6. (1945). As we may think. doi
  7. (1996). Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace.
  8. (1987). Data compression using dynamic Markov modelling. doi
  9. (2000). Data mining of user navigation patterns. doi
  10. (1998). E±cient crawling through URL ordering. doi
  11. (1998). E±cient data mining for traversal patterns. doi
  12. (1998). Efficient crawling through URL ordering. doi
  13. (1998). Efficient data mining for traversal patterns. doi
  14. (1991). Elements of Information Theory. Wiley Series in Telecommunications. doi
  15. (1960). Finite Markov Chains. doi
  16. (2002). Kemeny's constant and the random surfer. doi
  17. (2002). Kemeny’s constant and the random surfer. doi
  18. (1957). Mathematical Foundations of Information Theory. doi
  19. (1985). Matrix Analysis. doi
  20. (1999). Measuring index quality using random walks on the web. doi
  21. (1999). Mining the web's link structure. doi
  22. (1999). Mining the web’s link structure. doi
  23. (1989). Modeling for text compression. doi
  24. (1999). Navigation in hypertext is easy only sometimes. doi
  25. (1998). Nonparametric entropy estimation for stationary processes and random ¯elds, with applications to English text. doi
  26. (1998). Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. doi
  27. (1978). On the citation in°uence methodology of Pinski and Narin. doi
  28. (1978). On the citation influence methodology of Pinski and Narin. doi
  29. On the method of bounded di®erences.
  30. On the method of bounded differences. doi
  31. (1947). On the notion of recurrence in discrete stochastic processes. doi
  32. (1992). Probability and Random Processes. doi
  33. (1963). Probability inequalities for sums of bounded random variables. doi
  34. (1973). Statistical inference regarding Markov chain models. doi
  35. (1961). Statistical methods in Markov chains. doi
  36. (1998). Strong regularities in world wide web sur¯ng. doi
  37. (1998). Strong regularities in world wide web surfing. doi
  38. (2002). Web interaction and the navigation problem in hypertext.
  39. (2000). Web mining research: A survey. doi
  40. (2001). Zipf's law for web surfers. doi
  41. (2001). Zipf’s law for web surfers. doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.