Search CORE

2,022 research outputs found

PageRank: Standing on the shoulders of giants

Author: Franceschet Massimo
Publication venue
Publication date: 14/08/2010
Field of study

PageRank is a Web page ranking technique that has been a fundamental ingredient in the development and success of the Google search engine. The method is still one of the many signals that Google uses to determine which pages are most important. The main idea behind PageRank is to determine the importance of a Web page in terms of the importance assigned to the pages hyperlinking to it. In fact, this thesis is not new, and has been previously successfully exploited in different contexts. We review the PageRank method and link it to some renowned previous techniques that we have found in the fields of Web information retrieval, bibliometrics, sociometry, and econometrics

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

Local Ranking Problem on the BrowseGraph

Author: Andersen R.
Bharat K.
Boldi P.
Chiarandini L.
Cho J.
Davis J. V.
Gyöngyi Z.
Lehmann J.
Page L.
Smola A. J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/05/2015
Field of study

The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a web-server has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8

arXiv.org e-Print Archive

CiteSeerX

Crossref

Ranking Spaces for Predicting Human Movement in an Urban Environment

Author: Alexander C.
Batty M.
Bin Jiang
Bollen J.
Chen P.
Ching W. K.
Hillier B.
Jiang B.
Jiang B.
Langville A. N.
Page L.
Park H.
Scott J.
Zipf G. K.
Publication venue: 'Informa UK Limited'
Publication date: 18/02/2008
Field of study

A city can be topologically represented as a connectivity graph, consisting of nodes representing individual spaces and links if the corresponding spaces are intersected. It turns out in the space syntax literature that some defined topological metrics can capture human movement rates in individual spaces. In other words, the topological metrics are significantly correlated to human movement rates, and individual spaces can be ranked by the metrics for predicting human movement. However, this correlation has never been well justified. In this paper, we study the same issue by applying the weighted PageRank algorithm to the connectivity graph or space-space topology for ranking the individual spaces, and find surprisingly that (1) the PageRank scores are better correlated to human movement rates than the space syntax metrics, and (2) the underlying space-space topology demonstrates small world and scale free properties. The findings provide a novel justification as to why space syntax, or topological analysis in general, can be used to predict human movement. We further conjecture that this kind of analysis is no more than predicting a drunkard's walking on a small world and scale free network. Keywords: Space syntax, topological analysis of networks, small world, scale free, human movement, and PageRankComment: 11 pages, 5 figures, and 2 tables, English corrections from version 1 to version 2, major changes in the section of introduction from version 2 to

arXiv.org e-Print Archive

Crossref

FolkRank: A Ranking Algorithm for Folksonomies

Author: Hotho Andreas
Jäschke Robert
Schmitz Christoph
Stumme Gerd
Publication venue
Publication date: 21/04/2011
Field of study

In social bookmark tools users are setting up lightweight conceptual structures called folksonomies. Currently, the information retrieval support is limited. We present a formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy. The proposed algorithm is also applied to find communities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset. A long version of this paper has been published at the European Semantic Web Conference 2006

University of Hildesheim

PageRank optimization applied to spam detection

Author: Fercoq Olivier
Publication venue
Publication date: 07/03/2012
Field of study

We give a new link spam detection and PageRank demotion algorithm called MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked trusted and spam pages. We define the MaxRank of a page as the frequency of visit of this page by a random surfer minimizing an average cost per time unit. On a given page, the random surfer selects a set of hyperlinks and clicks with uniform probability on any of these hyperlinks. The cost function penalizes spam pages and hyperlink removals. The goal is to determine a hyperlink deletion policy that minimizes this score. The MaxRank is interpreted as a modified PageRank vector, used to sort web pages instead of the usual PageRank vector. The bias vector of this ergodic control problem, which is unique up to an additive constant, is a measure of the "spamicity" of each page, used to detect spam pages. We give a scalable algorithm for MaxRank computation that allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We show that our algorithm outperforms both TrustRank and AntiTrustRank for spam and nonspam page detection.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique