71 research outputs found
Maximizing PageRank via outlinks
We analyze linkage strategies for a set I of webpages for which the webmaster
wants to maximize the sum of Google's PageRank scores. The webmaster can only
choose the hyperlinks starting from the webpages of I and has no control on the
hyperlinks from other webpages. We provide an optimal linkage strategy under
some reasonable assumptions.Comment: 27 pages, 14 figures, submitted to Linear Algebra App
Ergodic Control and Polyhedral approaches to PageRank Optimization
We study a general class of PageRank optimization problems which consist in
finding an optimal outlink strategy for a web site subject to design
constraints. We consider both a continuous problem, in which one can choose the
intensity of a link, and a discrete one, in which in each page, there are
obligatory links, facultative links and forbidden links. We show that the
continuous problem, as well as its discrete variant when there are no
constraints coupling different pages, can both be modeled by constrained Markov
decision processes with ergodic reward, in which the webmaster determines the
transition probabilities of websurfers. Although the number of actions turns
out to be exponential, we show that an associated polytope of transition
measures has a concise representation, from which we deduce that the continuous
problem is solvable in polynomial time, and that the same is true for the
discrete problem when there are no coupling constraints. We also provide
efficient algorithms, adapted to very large networks. Then, we investigate the
qualitative features of optimal outlink strategies, and identify in particular
assumptions under which there exists a "master" page to which all controlled
pages should point. We report numerical results on fragments of the real web
graph.Comment: 39 page
Perron vector optimization applied to search engines
In the last years, Google's PageRank optimization problems have been
extensively studied. In that case, the ranking is given by the invariant
measure of a stochastic matrix. In this paper, we consider the more general
situation in which the ranking is determined by the Perron eigenvector of a
nonnegative, but not necessarily stochastic, matrix, in order to cover
Kleinberg's HITS algorithm. We also give some results for Tomlin's HOTS
algorithm. The problem consists then in finding an optimal outlink strategy
subject to design constraints and for a given search engine.
We study the relaxed versions of these problems, which means that we should
accept weighted hyperlinks. We provide an efficient algorithm for the
computation of the matrix of partial derivatives of the criterion, that uses
the low rank property of this matrix. We give a scalable algorithm that couples
gradient and power iterations and gives a local minimum of the Perron vector
optimization problem. We prove convergence by considering it as an approximate
gradient method.
We then show that optimal linkage stategies of HITS and HOTS optimization
problems verify a threshold property. We report numerical results on fragments
of the real web graph for these search engine optimization problems.Comment: 28 pages, 5 figure
PageRank optimization applied to spam detection
We give a new link spam detection and PageRank demotion algorithm called
MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked
trusted and spam pages. We define the MaxRank of a page as the frequency of
visit of this page by a random surfer minimizing an average cost per time unit.
On a given page, the random surfer selects a set of hyperlinks and clicks with
uniform probability on any of these hyperlinks. The cost function penalizes
spam pages and hyperlink removals. The goal is to determine a hyperlink
deletion policy that minimizes this score. The MaxRank is interpreted as a
modified PageRank vector, used to sort web pages instead of the usual PageRank
vector. The bias vector of this ergodic control problem, which is unique up to
an additive constant, is a measure of the "spamicity" of each page, used to
detect spam pages. We give a scalable algorithm for MaxRank computation that
allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We
show that our algorithm outperforms both TrustRank and AntiTrustRank for spam
and nonspam page detection.Comment: 8 pages, 6 figure
PageRank Optimization by Edge Selection
The importance of a node in a directed graph can be measured by its PageRank.
The PageRank of a node is used in a number of application contexts - including
ranking websites - and can be interpreted as the average portion of time spent
at the node by an infinite random walk. We consider the problem of maximizing
the PageRank of a node by selecting some of the edges from a set of edges that
are under our control. By applying results from Markov decision theory, we show
that an optimal solution to this problem can be found in polynomial time. Our
core solution results in a linear programming formulation, but we also provide
an alternative greedy algorithm, a variant of policy iteration, which runs in
polynomial time, as well. Finally, we show that, under the slight modification
for which we are given mutually exclusive pairs of edges, the problem of
PageRank optimization becomes NP-hard.Comment: 30 pages, 3 figure
The Open Research Web: A Preview of the Optimal and the Inevitable
The multiple online research impact metrics we are developing will allow the rich new database , the Research Web, to be navigated, analyzed, mined and evaluated in powerful new ways that were not even conceivable in the paper era – nor even in the online era, until the database and the tools became openly accessible for online use by all: by researchers, research institutions, research funders, teachers, students, and even by the general public that funds the research and for whose benefit it is being conducted: Which research is being used most? By whom? Which research is growing most quickly? In what direction? under whose influence? Which research is showing immediate short-term usefulness, which shows delayed, longer term usefulness, and which has sustained long-lasting impact? Which research and researchers are the most authoritative? Whose research is most using this authoritative research, and whose research is the authoritative research using? Which are the best pointers (“hubs”) to the authoritative research? Is there any way to predict what research will have later citation impact (based on its earlier download impact), so junior researchers can be given resources before their work has had a chance to make itself felt through citations? Can research trends and directions be predicted from the online database? Can text content be used to find and compare related research, for influence, overlap, direction? Can a layman, unfamiliar with the specialized content of a field, be guided to the most relevant and important work? These are just a sample of the new online-age questions that the Open Research Web will begin to answer
- …