13 research outputs found
Ergodic Control and Polyhedral approaches to PageRank Optimization
We study a general class of PageRank optimization problems which consist in
finding an optimal outlink strategy for a web site subject to design
constraints. We consider both a continuous problem, in which one can choose the
intensity of a link, and a discrete one, in which in each page, there are
obligatory links, facultative links and forbidden links. We show that the
continuous problem, as well as its discrete variant when there are no
constraints coupling different pages, can both be modeled by constrained Markov
decision processes with ergodic reward, in which the webmaster determines the
transition probabilities of websurfers. Although the number of actions turns
out to be exponential, we show that an associated polytope of transition
measures has a concise representation, from which we deduce that the continuous
problem is solvable in polynomial time, and that the same is true for the
discrete problem when there are no coupling constraints. We also provide
efficient algorithms, adapted to very large networks. Then, we investigate the
qualitative features of optimal outlink strategies, and identify in particular
assumptions under which there exists a "master" page to which all controlled
pages should point. We report numerical results on fragments of the real web
graph.Comment: 39 page
PageRank optimization applied to spam detection
We give a new link spam detection and PageRank demotion algorithm called
MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked
trusted and spam pages. We define the MaxRank of a page as the frequency of
visit of this page by a random surfer minimizing an average cost per time unit.
On a given page, the random surfer selects a set of hyperlinks and clicks with
uniform probability on any of these hyperlinks. The cost function penalizes
spam pages and hyperlink removals. The goal is to determine a hyperlink
deletion policy that minimizes this score. The MaxRank is interpreted as a
modified PageRank vector, used to sort web pages instead of the usual PageRank
vector. The bias vector of this ergodic control problem, which is unique up to
an additive constant, is a measure of the "spamicity" of each page, used to
detect spam pages. We give a scalable algorithm for MaxRank computation that
allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We
show that our algorithm outperforms both TrustRank and AntiTrustRank for spam
and nonspam page detection.Comment: 8 pages, 6 figure
Ergodic Randomized Algorithms and Dynamics over Networks
Algorithms and dynamics over networks often involve randomization, and
randomization may result in oscillating dynamics which fail to converge in a
deterministic sense. In this paper, we observe this undesired feature in three
applications, in which the dynamics is the randomized asynchronous counterpart
of a well-behaved synchronous one. These three applications are network
localization, PageRank computation, and opinion dynamics. Motivated by their
formal similarity, we show the following general fact, under the assumptions of
independence across time and linearities of the updates: if the expected
dynamics is stable and converges to the same limit of the original synchronous
dynamics, then the oscillations are ergodic and the desired limit can be
locally recovered via time-averaging.Comment: 11 pages; submitted for publication. revised version with fixed
technical flaw and updated reference
Controlling the Katz-Bonacich Centrality in Social Network: Application to gossip in Online Social Networks
International audienceRecent papers studied the control of spectral centrality measures of a network by manipulating the topology of the network. We extend these works by focusing on a specific spectral centrality measure, the Katz-Bonacich centrality. The optimization of the Katz-Bonacich centrality using a topological control is called the Katz-Bonacich optimization problem. We first prove that this problem is equivalent to a linear optimization problem. Thus, in the context of large graphs, we can use state of the art algorithms. We provide a specific applications of the Katz-Bonacich centrality minimization problem based on the minimization of gossip propagation and make some experiments on real networks
PageRank Optimization by Edge Selection
The importance of a node in a directed graph can be measured by its PageRank.
The PageRank of a node is used in a number of application contexts - including
ranking websites - and can be interpreted as the average portion of time spent
at the node by an infinite random walk. We consider the problem of maximizing
the PageRank of a node by selecting some of the edges from a set of edges that
are under our control. By applying results from Markov decision theory, we show
that an optimal solution to this problem can be found in polynomial time. Our
core solution results in a linear programming formulation, but we also provide
an alternative greedy algorithm, a variant of policy iteration, which runs in
polynomial time, as well. Finally, we show that, under the slight modification
for which we are given mutually exclusive pairs of edges, the problem of
PageRank optimization becomes NP-hard.Comment: 30 pages, 3 figure