2,022 research outputs found
PageRank: Standing on the shoulders of giants
PageRank is a Web page ranking technique that has been a fundamental
ingredient in the development and success of the Google search engine. The
method is still one of the many signals that Google uses to determine which
pages are most important. The main idea behind PageRank is to determine the
importance of a Web page in terms of the importance assigned to the pages
hyperlinking to it. In fact, this thesis is not new, and has been previously
successfully exploited in different contexts. We review the PageRank method and
link it to some renowned previous techniques that we have found in the fields
of Web information retrieval, bibliometrics, sociometry, and econometrics
Local Ranking Problem on the BrowseGraph
The "Local Ranking Problem" (LRP) is related to the computation of a
centrality-like rank on a local graph, where the scores of the nodes could
significantly differ from the ones computed on the global graph. Previous work
has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a
graph where nodes are webpages and edges are browsing transitions. Recently,
this graph has received more and more attention in many different tasks such as
ranking, prediction and recommendation. However, a web-server has only the
browsing traffic performed on its pages (local BrowseGraph) and, as a
consequence, the local computation can lead to estimation errors, which hinders
the increasing number of applications in the state of the art. Also, although
the divergence between the local and global ranks has been measured, the
possibility of estimating such divergence using only local knowledge has been
mainly overlooked. These aspects are of great interest for online service
providers who want to: (i) gauge their ability to correctly assess the
importance of their resources only based on their local knowledge, and (ii)
take into account real user browsing fluxes that better capture the actual user
interest than the static hyperlink network. We study the LRP problem on a
BrowseGraph from a large news provider, considering as subgraphs the
aggregations of browsing traces of users coming from different domains. We show
that the distance between rankings can be accurately predicted based only on
structural information of the local graph, being able to achieve an average
rank correlation as high as 0.8
Ranking Spaces for Predicting Human Movement in an Urban Environment
A city can be topologically represented as a connectivity graph, consisting
of nodes representing individual spaces and links if the corresponding spaces
are intersected. It turns out in the space syntax literature that some defined
topological metrics can capture human movement rates in individual spaces. In
other words, the topological metrics are significantly correlated to human
movement rates, and individual spaces can be ranked by the metrics for
predicting human movement. However, this correlation has never been well
justified. In this paper, we study the same issue by applying the weighted
PageRank algorithm to the connectivity graph or space-space topology for
ranking the individual spaces, and find surprisingly that (1) the PageRank
scores are better correlated to human movement rates than the space syntax
metrics, and (2) the underlying space-space topology demonstrates small world
and scale free properties. The findings provide a novel justification as to why
space syntax, or topological analysis in general, can be used to predict human
movement. We further conjecture that this kind of analysis is no more than
predicting a drunkard's walking on a small world and scale free network.
Keywords: Space syntax, topological analysis of networks, small world, scale
free, human movement, and PageRankComment: 11 pages, 5 figures, and 2 tables, English corrections from version 1
to version 2, major changes in the section of introduction from version 2 to
FolkRank: A Ranking Algorithm for Folksonomies
In social bookmark tools users are setting up lightweight conceptual structures called folksonomies. Currently, the information retrieval support is limited. We present a formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy. The proposed algorithm is also applied to find communities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset. A long version of this paper has been published at the European Semantic Web Conference 2006
PageRank optimization applied to spam detection
We give a new link spam detection and PageRank demotion algorithm called
MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked
trusted and spam pages. We define the MaxRank of a page as the frequency of
visit of this page by a random surfer minimizing an average cost per time unit.
On a given page, the random surfer selects a set of hyperlinks and clicks with
uniform probability on any of these hyperlinks. The cost function penalizes
spam pages and hyperlink removals. The goal is to determine a hyperlink
deletion policy that minimizes this score. The MaxRank is interpreted as a
modified PageRank vector, used to sort web pages instead of the usual PageRank
vector. The bias vector of this ergodic control problem, which is unique up to
an additive constant, is a measure of the "spamicity" of each page, used to
detect spam pages. We give a scalable algorithm for MaxRank computation that
allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We
show that our algorithm outperforms both TrustRank and AntiTrustRank for spam
and nonspam page detection.Comment: 8 pages, 6 figure
- …