30 research outputs found
Fast Distributed PageRank Computation
Over the last decade, PageRank has gained importance in a wide range of
applications and domains, ever since it first proved to be effective in
determining node importance in large graphs (and was a pioneering idea behind
Google's search engine). In distributed computing alone, PageRank vector, or
more generally random walk based quantities have been used for several
different applications ranging from determining important nodes, load
balancing, search, and identifying connectivity structures. Surprisingly,
however, there has been little work towards designing provably efficient
fully-distributed algorithms for computing PageRank. The difficulty is that
traditional matrix-vector multiplication style iterative methods may not always
adapt well to the distributed setting owing to communication bandwidth
restrictions and convergence rates.
In this paper, we present fast random walk-based distributed algorithms for
computing PageRanks in general graphs and prove strong bounds on the round
complexity. We first present a distributed algorithm that takes O\big(\log
n/\eps \big) rounds with high probability on any graph (directed or
undirected), where is the network size and \eps is the reset probability
used in the PageRank computation (typically \eps is a fixed constant). We
then present a faster algorithm that takes O\big(\sqrt{\log n}/\eps \big)
rounds in undirected graphs. Both of the above algorithms are scalable, as each
node sends only small (\polylog n) number of bits over each edge per round.
To the best of our knowledge, these are the first fully distributed algorithms
for computing PageRank vector with provably efficient running time.Comment: 14 page
Прикладные аспекты использования алгоритмов ранжирования для ориентированных взвешенных графов(на примере графов социальных сетей)
The article deals with the applied aspects of the preliminary vertices ranking for oriented weighted graph. In this paper, the authors observed the widespread use of this technique in developing heuristic discrete optimization algorithms. The ranking problem is directly related to the problem of social networks centrality and large real world data sets but as shown in the article ranking is explicitly or implicitly used in the development of algorithms as the initial stage of obtaining a solution for solving applied problems. Examples of such ranking application are given. The examples demonstrate the increase of efficiency for solving some optimization applied problems, which are widely used in mathematical methods of optimization, decision-making not only from the theoretical development point of view but also their applications. The article describes the structure of the first phase of the computational experiment, which is associated with the procedure of obtaining test data sets. The obtained data are presented by weighted graphs that correspond to several groups of the social network Vkontakte with the number of participants in the range from 9000 to 24 thousand. It is shown that the structural characteristics of the obtained graphs differ significantly in the number of connectivity components. Characteristics of centrality (degree's sequences), as shown, have exponential distribution. The main attention is given to the analysis of three approaches to graph vertices ranking. We propose analysis and comparison of the obtained set of ranks by the nature of their distribution. The definition of convergence for graph vertex ranking algorithms is introduced and the differences of their use in considering the data of large dimension and the need to build a solution in the presence of local changes are discussed.Рассматриваются прикладные аспекты использования предварительного ранжирования вершин ориентированного взвешенного графа. Особое внимание уделяется широкому использованию такого приема в разработке эвристических алгоритмов дискретной оптимизации. Задача ранжирования имеет непосредственное отношение к проблеме определения центральности в социальных сетях, обработке больших массивов данных реального мира, но как показано в статье, явно или косвенно используется при разработке алгоритмов решения прикладных задач в качестве начального этапа построения решения. Приводятся примеры использования предварительного ранжирования, в которых продемонстрировано повышение эффективности решения некоторых прикладных задач, имеющих широкое применение в математических методах оптимизации. Дано описание структуры первой фазы вычислительного эксперимента, которая связана с получением тестовых наборов данных. Полученные данные представлены взвешенными графами, которые соответствуют нескольким группам социальной сети ВКонтакте с числом вершин в диапазоне от 9000 до 24 тысяч участников. Показано, что структурные характеристики полученных графов по числу компонент связности существенно различаются. Продемонстрированы некоторые характеристики центральности (распределения степенных последовательностей), которые имеют экспоненциальный характер. Основное внимание уделяется анализу трех алгоритмов построения иерархии ранжирования вершин графов, предлагаются новые подходы к вычислению рангов вершин с использованием информации об активности пользователей в социальных сетях. Проводится сравнение распределений полученных совокупностей рангов. Вводится понятие сходимости алгоритмов ранжирования вершин графов, а также обсуждаются различия их использования при рассмотрении данных большой размерности и необходимости построения решения в случае учета только локальных изменений
Local dependency in networks
Many real world data and processes have a network structure and can usefully be represented as graphs. Network analysis focuses on the relations among the nodes exploring the properties of each network. We introduce a method for measuring the strength of the relationship between two nodes of a network and for their ranking. This method is applicable to all kinds of networks, including directed and weighted networks. The approach extracts dependency relations among the network's nodes from the structure in local surroundings of individual nodes. For the tasks we deal with in this article, the key technical parameter is locality. Since only the surroundings of the examined nodes are used in computations, there is no need to analyze the entire network. This allows the application of our approach in the area of large-scale networks. We present several experiments using small networks as well as large-scale artificial and real world networks. The results of the experiments show high effectiveness due to the locality of our approach and also high quality node ranking comparable to PageRank.Web of Science25229328
On the Distributed Complexity of Large-Scale Graph Computations
Motivated by the increasing need to understand the distributed algorithmic
foundations of large-scale graph computations, we study some fundamental graph
problems in a message-passing model for distributed computing where
machines jointly perform computations on graphs with nodes (typically, ). The input graph is assumed to be initially randomly partitioned among
the machines, a common implementation in many real-world systems.
Communication is point-to-point, and the goal is to minimize the number of
communication {\em rounds} of the computation.
Our main contribution is the {\em General Lower Bound Theorem}, a theorem
that can be used to show non-trivial lower bounds on the round complexity of
distributed large-scale data computations. The General Lower Bound Theorem is
established via an information-theoretic approach that relates the round
complexity to the minimal amount of information required by machines to solve
the problem. Our approach is generic and this theorem can be used in a
"cookbook" fashion to show distributed lower bounds in the context of several
problems, including non-graph problems. We present two applications by showing
(almost) tight lower bounds for the round complexity of two fundamental graph
problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our
approach, as demonstrated in the case of PageRank, can yield tight lower bounds
for problems (including, and especially, under a stochastic partition of the
input) where communication complexity techniques are not obvious.
Our approach, as demonstrated in the case of triangle enumeration, can yield
stronger round lower bounds as well as message-round tradeoffs compared to
approaches that use communication complexity techniques
Evaluer la crédibilité des sources historiques
International audienceLa recherche en histoire s'appuie principalement sur l'étude des sources d'information historique. Les résultats de cette recherche dépendent largement de la qualité des sources d'information. L'objectif de cet article est de décrire les premiers éléments d'une approche d'évaluation automatique de la crédibilité des sources d'information historique numérisées. Fondée sur une approche des sciences de conception (design science), notre contribution comporte un modèle conceptuel décrivant les caractéristiques principales des sources d'information historique et une démarche algorithmique d'estimation de la crédibilité fondée sur ce modèle. La suite de cette recherche consistera en l'application de cette approche à la recherche prosopographique médiévale