ÔØ Å ÒÙ× Ö ÔØ Constructing a Reliable Web Graph with Information on Browsing Behavior ACCEPTED MANUSCRIPT Constructing a Reliable Web Graph with Information on Browsing Behavior

Abstract

Abstract Page quality estimation is one of the greatest challenges for Web search engines. Hyperlink analysis algorithms such as PageRank and TrustRank are usually adopted for this task. However, low quality, unreliable and even spam data in the Web hyperlink graph makes it increasingly difficult to estimate page quality effectively. Analyzing large-scale user browsing behavior logs, we found that a more reliable Web graph can be constructed by incorporating browsing behavior information. The experimental results show that hyperlink graphs constructed with the proposed methods are much smaller in size than the original graph. In addition, algorithms based on the proposed "surfing with prior knowledge" model obtain better estimation results with these graphs for both high quality page and spam page identification tasks. Hyperlink graphs constructed with the proposed methods evaluate Web page quality more precisely and with less computational effort. HIGHLIGHTS 1. With user browsing behavior information, it is possible to improve the performance of quality estimation results for commercial search engines. 2. Three different kinds of Web graphs were proposed which combines original hyperlink and user browsing behavior information. 3. Differences between the constructed graphs and the original Web graph show that the constructed graphs provide more reliable information and can be adopted for practical quality estimation tasks. 4. The incorporation of user browsing information is more important than the selection of link analysis algorithms for the task of quality estimation

    Similar works

    Full text

    thumbnail-image

    Available Versions