30 research outputs found

    Fast Distributed PageRank Computation

    Full text link
    Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google's search engine). In distributed computing alone, PageRank vector, or more generally random walk based quantities have been used for several different applications ranging from determining important nodes, load balancing, search, and identifying connectivity structures. Surprisingly, however, there has been little work towards designing provably efficient fully-distributed algorithms for computing PageRank. The difficulty is that traditional matrix-vector multiplication style iterative methods may not always adapt well to the distributed setting owing to communication bandwidth restrictions and convergence rates. In this paper, we present fast random walk-based distributed algorithms for computing PageRanks in general graphs and prove strong bounds on the round complexity. We first present a distributed algorithm that takes O\big(\log n/\eps \big) rounds with high probability on any graph (directed or undirected), where nn is the network size and \eps is the reset probability used in the PageRank computation (typically \eps is a fixed constant). We then present a faster algorithm that takes O\big(\sqrt{\log n}/\eps \big) rounds in undirected graphs. Both of the above algorithms are scalable, as each node sends only small (\polylog n) number of bits over each edge per round. To the best of our knowledge, these are the first fully distributed algorithms for computing PageRank vector with provably efficient running time.Comment: 14 page

    Прикладные аспекты использования алгоритмов ранжирования для ориентированных взвешенных графов(на примере графов социальных сетей)

    Get PDF
    The article deals with the applied aspects of the preliminary vertices ranking for oriented weighted graph. In this paper, the authors observed the widespread use of this technique in developing heuristic discrete optimization algorithms. The ranking problem is directly related to the problem of social networks centrality and large real world data sets but as shown in the article ranking is explicitly or implicitly used in the development of algorithms as the initial stage of obtaining a solution for solving applied problems. Examples of such ranking application are given. The examples demonstrate the increase of efficiency for solving some optimization applied problems, which are widely used in mathematical methods of optimization, decision-making not only from the theoretical development point of view but also their applications. The article describes the structure of the first phase of the computational experiment, which is associated with the procedure of obtaining test data sets. The obtained data are presented by weighted graphs that correspond to several groups of the social network Vkontakte with the number of participants in the range from 9000 to 24 thousand. It is shown that the structural characteristics of the obtained graphs differ significantly in the number of connectivity components. Characteristics of centrality (degree's sequences), as shown, have exponential distribution. The main attention is given to the analysis of three approaches to graph vertices ranking. We propose analysis and comparison of the obtained set of ranks by the nature of their distribution. The definition of convergence for graph vertex ranking algorithms is introduced and the differences of their use in considering the data of large dimension and the need to build a solution in the presence of local changes are discussed.Рассматриваются прикладные аспекты использования предварительного ранжирования вершин ориентированного взвешенного графа. Особое внимание уделяется широкому использованию такого приема в разработке эвристических алгоритмов дискретной оптимизации. Задача ранжирования имеет непосредственное отношение к проблеме определения центральности в социальных сетях, обработке больших массивов данных реального мира, но как показано в статье, явно или косвенно используется при разработке алгоритмов решения прикладных задач в качестве начального этапа построения решения. Приводятся примеры использования предварительного ранжирования, в которых продемонстрировано повышение эффективности решения некоторых прикладных задач, имеющих широкое применение в математических методах оптимизации. Дано описание структуры первой фазы вычислительного эксперимента, которая связана с получением тестовых наборов данных. Полученные данные представлены взвешенными графами, которые соответствуют нескольким группам социальной сети ВКонтакте с числом вершин в диапазоне от 9000 до 24 тысяч участников. Показано, что структурные характеристики полученных графов по числу компонент связности существенно различаются. Продемонстрированы некоторые характеристики центральности (распределения степенных последовательностей), которые имеют экспоненциальный характер. Основное внимание уделяется анализу трех алгоритмов построения иерархии ранжирования вершин графов, предлагаются новые подходы к вычислению рангов вершин с использованием информации об активности пользователей в социальных сетях. Проводится сравнение распределений полученных совокупностей рангов. Вводится понятие сходимости алгоритмов ранжирования вершин графов, а также обсуждаются различия их использования при рассмотрении данных большой размерности и необходимости построения решения в случае учета только локальных изменений

    Local dependency in networks

    Get PDF
    Many real world data and processes have a network structure and can usefully be represented as graphs. Network analysis focuses on the relations among the nodes exploring the properties of each network. We introduce a method for measuring the strength of the relationship between two nodes of a network and for their ranking. This method is applicable to all kinds of networks, including directed and weighted networks. The approach extracts dependency relations among the network's nodes from the structure in local surroundings of individual nodes. For the tasks we deal with in this article, the key technical parameter is locality. Since only the surroundings of the examined nodes are used in computations, there is no need to analyze the entire network. This allows the application of our approach in the area of large-scale networks. We present several experiments using small networks as well as large-scale artificial and real world networks. The results of the experiments show high effectiveness due to the locality of our approach and also high quality node ranking comparable to PageRank.Web of Science25229328

    On the Distributed Complexity of Large-Scale Graph Computations

    Full text link
    Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k2k \geq 2 machines jointly perform computations on graphs with nn nodes (typically, nkn \gg k). The input graph is assumed to be initially randomly partitioned among the kk machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication {\em rounds} of the computation. Our main contribution is the {\em General Lower Bound Theorem}, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. The General Lower Bound Theorem is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic and this theorem can be used in a "cookbook" fashion to show distributed lower bounds in the context of several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds for the round complexity of two fundamental graph problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our approach, as demonstrated in the case of PageRank, can yield tight lower bounds for problems (including, and especially, under a stochastic partition of the input) where communication complexity techniques are not obvious. Our approach, as demonstrated in the case of triangle enumeration, can yield stronger round lower bounds as well as message-round tradeoffs compared to approaches that use communication complexity techniques

    Evaluer la crédibilité des sources historiques

    Get PDF
    International audienceLa recherche en histoire s'appuie principalement sur l'étude des sources d'information historique. Les résultats de cette recherche dépendent largement de la qualité des sources d'information. L'objectif de cet article est de décrire les premiers éléments d'une approche d'évaluation automatique de la crédibilité des sources d'information historique numérisées. Fondée sur une approche des sciences de conception (design science), notre contribution comporte un modèle conceptuel décrivant les caractéristiques principales des sources d'information historique et une démarche algorithmique d'estimation de la crédibilité fondée sur ce modèle. La suite de cette recherche consistera en l'application de cette approche à la recherche prosopographique médiévale
    corecore