1,314 research outputs found
A Web Aggregation Approach for Distributed Randomized PageRank Algorithms
The PageRank algorithm employed at Google assigns a measure of importance to
each web page for rankings in search results. In our recent papers, we have
proposed a distributed randomized approach for this algorithm, where web pages
are treated as agents computing their own PageRank by communicating with linked
pages. This paper builds upon this approach to reduce the computation and
communication loads for the algorithms. In particular, we develop a method to
systematically aggregate the web pages into groups by exploiting the sparsity
inherent in the web. For each group, an aggregated PageRank value is computed,
which can then be distributed among the group members. We provide a distributed
update scheme for the aggregated PageRank along with an analysis on its
convergence properties. The method is especially motivated by results on
singular perturbation techniques for large-scale Markov chains and multi-agent
consensus.Comment: To appear in the IEEE Transactions on Automatic Control, 201
Learning Reputation in an Authorship Network
The problem of searching for experts in a given academic field is hugely
important in both industry and academia. We study exactly this issue with
respect to a database of authors and their publications. The idea is to use
Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) to perform
topic modelling in order to find authors who have worked in a query field. We
then construct a coauthorship graph and motivate the use of influence
maximisation and a variety of graph centrality measures to obtain a ranked list
of experts. The ranked lists are further improved using a Markov Chain-based
rank aggregation approach. The complete method is readily scalable to large
datasets. To demonstrate the efficacy of the approach we report on an extensive
set of computational simulations using the Arnetminer dataset. An improvement
in mean average precision is demonstrated over the baseline case of simply
using the order of authors found by the topic models
FrogWild! -- Fast PageRank Approximations on Graph Engines
We propose FrogWild, a novel algorithm for fast approximation of high
PageRank vertices, geared towards reducing network costs of running traditional
PageRank algorithms. Our algorithm can be seen as a quantized version of power
iteration that performs multiple parallel random walks over a directed graph.
One important innovation is that we introduce a modification to the GraphLab
framework that only partially synchronizes mirror vertices. This partial
synchronization vastly reduces the network traffic generated by traditional
PageRank algorithms, thus greatly reducing the per-iteration cost of PageRank.
On the other hand, this partial synchronization also creates dependencies
between the random walks used to estimate PageRank. Our main theoretical
innovation is the analysis of the correlations introduced by this partial
synchronization process and a bound establishing that our approximation is
close to the true PageRank vector.
We implement our algorithm in GraphLab and compare it against the default
PageRank implementation. We show that our algorithm is very fast, performing
each iteration in less than one second on the Twitter graph and can be up to 7x
faster compared to the standard GraphLab PageRank implementation
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG
is a multi-relational graph that has proven valuable for many tasks including
question answering and semantic search. In this paper, we present GENI, a
method for tackling the problem of estimating node importance in KGs, which
enables several downstream applications such as item recommendation and
resource allocation. While a number of approaches have been developed to
address this problem for general graphs, they do not fully utilize information
available in KGs, or lack flexibility needed to model complex relationship
between entities and their importance. To address these limitations, we explore
supervised machine learning algorithms. In particular, building upon recent
advancement of graph neural networks (GNNs), we develop GENI, a GNN-based
method designed to deal with distinctive challenges involved with predicting
node importance in KGs. Our method performs an aggregation of importance scores
instead of aggregating node embeddings via predicate-aware attention mechanism
and flexible centrality adjustment. In our evaluation of GENI and existing
methods on predicting node importance in real-world KGs with different
characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.Comment: KDD 2019 Research Track. 11 pages. Changelog: Type 3 font removed,
and minor updates made in the Appendix (v2
- …