1 research outputs found
Topic Continuity for Web Document Categorization and Ranking
PageRank is primarily based on link structure analysis. Recently, it has been shown that content information can be utilized to improve link analysis. We propose a novel algorithm that harnesses the information contained in the history of a surfer to determine his topic of interest when he is on a given page. As the history is unavailable until query time, we guess it probabilistically so that the operations can be performed offline. This leads to a better web page categorization and, thereby, to a better ranking of web pages. 1