Search CORE

71 research outputs found

Maximizing PageRank via outlinks

Author: de Kerchove Cristobald
Ninove Laure
Van Dooren Paul
Publication venue
Publication date: 19/11/2007
Field of study

We analyze linkage strategies for a set I of webpages for which the webmaster wants to maximize the sum of Google's PageRank scores. The webmaster can only choose the hyperlinks starting from the webpages of I and has no control on the hyperlinks from other webpages. We provide an optimal linkage strategy under some reasonable assumptions.Comment: 27 pages, 14 figures, submitted to Linear Algebra App

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Ergodic Control and Polyhedral approaches to PageRank Optimization

Author: Akian Marianne
Bouhtou Mustapha
Fercoq Olivier
Gaubert Stéphane
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/09/2011
Field of study

We study a general class of PageRank optimization problems which consist in finding an optimal outlink strategy for a web site subject to design constraints. We consider both a continuous problem, in which one can choose the intensity of a link, and a discrete one, in which in each page, there are obligatory links, facultative links and forbidden links. We show that the continuous problem, as well as its discrete variant when there are no constraints coupling different pages, can both be modeled by constrained Markov decision processes with ergodic reward, in which the webmaster determines the transition probabilities of websurfers. Although the number of actions turns out to be exponential, we show that an associated polytope of transition measures has a concise representation, from which we deduce that the continuous problem is solvable in polynomial time, and that the same is true for the discrete problem when there are no coupling constraints. We also provide efficient algorithms, adapted to very large networks. Then, we investigate the qualitative features of optimal outlink strategies, and identify in particular assumptions under which there exists a "master" page to which all controlled pages should point. We report numerical results on fragments of the real web graph.Comment: 39 page

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

Perron vector optimization applied to search engines

Author: Avrachenkov
Bazaraa
Ben-Tal
Berman
Bertsekas
Bertsekas
Billy
Blondel
Blondel
Bonnans
Brin
Csáji
Cullum
de Kerchove
Delvenne
Deutsch
Deutsch
Fercoq
Fercoq
Groetschel
Haftka
Horn
Ishii
Kato
Keener
Kingman
Kleinberg
Langville
Langville
Lempel
Lewis
Logofet
Matsui
Mayer
Meyer
Nelson
Nesterov
Nocedal
Olivier Fercoq
Ostrowski
Overton
Overton
Parlett
Pironneau
Polak
Project
Rothblum
Saaty
Schneider
Shapiro
Tan
Tomlin
Vigna
Vlassis
Publication venue: 'Elsevier BV'
Publication date: 09/11/2011
Field of study

In the last years, Google's PageRank optimization problems have been extensively studied. In that case, the ranking is given by the invariant measure of a stochastic matrix. In this paper, we consider the more general situation in which the ranking is determined by the Perron eigenvector of a nonnegative, but not necessarily stochastic, matrix, in order to cover Kleinberg's HITS algorithm. We also give some results for Tomlin's HOTS algorithm. The problem consists then in finding an optimal outlink strategy subject to design constraints and for a given search engine. We study the relaxed versions of these problems, which means that we should accept weighted hyperlinks. We provide an efficient algorithm for the computation of the matrix of partial derivatives of the criterion, that uses the low rank property of this matrix. We give a scalable algorithm that couples gradient and power iterations and gives a local minimum of the Perron vector optimization problem. We prove convergence by considering it as an approximate gradient method. We then show that optimal linkage stategies of HITS and HOTS optimization problems verify a threshold property. We report numerical results on fragments of the real web graph for these search engine optimization problems.Comment: 28 pages, 5 figure

arXiv.org e-Print Archive

Crossref

PageRank optimization applied to spam detection

Author: Fercoq Olivier
Publication venue
Publication date: 07/03/2012
Field of study

We give a new link spam detection and PageRank demotion algorithm called MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked trusted and spam pages. We define the MaxRank of a page as the frequency of visit of this page by a random surfer minimizing an average cost per time unit. On a given page, the random surfer selects a set of hyperlinks and clicks with uniform probability on any of these hyperlinks. The cost function penalizes spam pages and hyperlink removals. The goal is to determine a hyperlink deletion policy that minimizes this score. The MaxRank is interpreted as a modified PageRank vector, used to sort web pages instead of the usual PageRank vector. The bias vector of this ergodic control problem, which is unique up to an additive constant, is a measure of the "spamicity" of each page, used to detect spam pages. We give a scalable algorithm for MaxRank computation that allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We show that our algorithm outperforms both TrustRank and AntiTrustRank for spam and nonspam page detection.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique

PageRank Optimization by Edge Selection

Author: Avrachenkov
Balázs Csanád Csáji
Berkhin
Bertsekas
Bertsekas
Bertsekas
Coppersmith
Csáji
De Kerchove
Garey
Gonzaga
Ishii
Langville
Levin
Papadimitriou
Puterman
Raphaël M. Jungers
Sutton
Tseng
Vincent D. Blondel
Publication venue: 'Elsevier BV'
Publication date: 18/01/2012
Field of study

The importance of a node in a directed graph can be measured by its PageRank. The PageRank of a node is used in a number of application contexts - including ranking websites - and can be interpreted as the average portion of time spent at the node by an infinite random walk. We consider the problem of maximizing the PageRank of a node by selecting some of the edges from a set of edges that are under our control. By applying results from Markov decision theory, we show that an optimal solution to this problem can be found in polynomial time. Our core solution results in a linear programming formulation, but we also provide an alternative greedy algorithm, a variant of policy iteration, which runs in polynomial time, as well. Finally, we show that, under the slight modification for which we are given mutually exclusive pairs of edges, the problem of PageRank optimization becomes NP-hard.Comment: 30 pages, 3 figure

arXiv.org e-Print Archive

Crossref

SZTAKI Publication Repository

Repository of the Academy's Library

DIAL UCLouvain

The Open Research Web: A Preview of the Optimal and the Inevitable

Author: Brody Tim
Carr Les
Harnad Stevan
Shadbolt Nigel
Publication venue: Chandos
Publication date: 01/01/2006
Field of study

The multiple online research impact metrics we are developing will allow the rich new database , the Research Web, to be navigated, analyzed, mined and evaluated in powerful new ways that were not even conceivable in the paper era – nor even in the online era, until the database and the tools became openly accessible for online use by all: by researchers, research institutions, research funders, teachers, students, and even by the general public that funds the research and for whose benefit it is being conducted: Which research is being used most? By whom? Which research is growing most quickly? In what direction? under whose influence? Which research is showing immediate short-term usefulness, which shows delayed, longer term usefulness, and which has sustained long-lasting impact? Which research and researchers are the most authoritative? Whose research is most using this authoritative research, and whose research is the authoritative research using? Which are the best pointers (“hubs”) to the authoritative research? Is there any way to predict what research will have later citation impact (based on its earlier download impact), so junior researchers can be given resources before their work has had a chance to make itself felt through citations? Can research trends and directions be predicted from the online database? Can text content be used to find and compare related research, for influence, overlap, direction? Can a layman, unfamiliar with the specialized content of a field, be guided to the most relevant and important work? These are just a sample of the new online-age questions that the Open Research Web will begin to answer

Southampton (e-Prints Soton)

CogPrints Cognitive Sciences Eprint Archive