Location of Repository

Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be useful in providing navigation assistance. Herein, we give a theoretical underpinning of user navigation in terms of the entropy of an underlying Markov chain modelling the web topology. We present a novel method for online incremental computation of the entropy and a large deviation result regarding the length of a trail to realize the said entropy. We provide an error analysis for our estimation of the entropy in terms of the divergence between the empirical and actual probabilities. We then indicate applications of our algorithm in the area of web data mining. Finally, we present an extension of our technique to higher-order Markov chains by a suitable reduction of a higher-order Markov chain model to a first-order one

Topics:
csis

Publisher: World Scientific Publishing Company

Year: 2003

OAI identifier:
oai:eprints.bbk.ac.uk.oai2:212

Provided by:
Birkbeck Institutional Research Online

Downloaded from
http://eprints.bbk.ac.uk/212/1/entropy.pdf

- (1998). A Cherno® bound for random walks on expander graphs.
- (1998). A Chernoff bound for random walks on expander graphs.
- (1989). A note on the DMC data compression scheme.
- (1999). A probabilistic approach to navigation in hypertext.
- (1968). An Introduction to Probability Theory and its Applications.
- (1945). As we may think.
- (1996). Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace.
- (1987). Data compression using dynamic Markov modelling.
- (2000). Data mining of user navigation patterns.
- (1998). E±cient crawling through URL ordering.
- (1998). E±cient data mining for traversal patterns.
- (1998). Efficient crawling through URL ordering.
- (1998). Efficient data mining for traversal patterns.
- (1991). Elements of Information Theory. Wiley Series in Telecommunications.
- (1960). Finite Markov Chains.
- (2002). Kemeny's constant and the random surfer.
- (2002). Kemeny’s constant and the random surfer.
- (1957). Mathematical Foundations of Information Theory.
- (1985). Matrix Analysis.
- (1999). Measuring index quality using random walks on the web.
- (1999). Mining the web's link structure.
- (1999). Mining the web’s link structure.
- (1989). Modeling for text compression.
- (1999). Navigation in hypertext is easy only sometimes.
- (1998). Nonparametric entropy estimation for stationary processes and random ¯elds, with applications to English text.
- (1998). Nonparametric entropy estimation for stationary processes and random fields, with applications to English text.
- (1978). On the citation in°uence methodology of Pinski and Narin.
- (1978). On the citation influence methodology of Pinski and Narin.
- On the method of bounded di®erences.
- On the method of bounded differences.
- (1947). On the notion of recurrence in discrete stochastic processes.
- (1992). Probability and Random Processes.
- (1963). Probability inequalities for sums of bounded random variables.
- (1973). Statistical inference regarding Markov chain models.
- (1961). Statistical methods in Markov chains.
- (1998). Strong regularities in world wide web sur¯ng.
- (1998). Strong regularities in world wide web surfing.
- (2002). Web interaction and the navigation problem in hypertext.
- (2000). Web mining research: A survey.
- (2001). Zipf's law for web surfers.
- (2001). Zipf’s law for web surfers.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.