1,778 research outputs found
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
High-performance implementations of graph algorithms are challenging to
implement on new parallel hardware such as GPUs because of three challenges:
(1) the difficulty of coming up with graph building blocks, (2) load imbalance
on parallel hardware, and (3) graph problems having low arithmetic intensity.
To address some of these challenges, GraphBLAS is an innovative, on-going
effort by the graph analytics community to propose building blocks based on
sparse linear algebra, which will allow graph algorithms to be expressed in a
performant, succinct, composable and portable manner. In this paper, we examine
the performance challenges of a linear-algebra-based approach to building graph
frameworks and describe new design principles for overcoming these bottlenecks.
Among the new design principles is exploiting input sparsity, which allows
users to write graph algorithms without specifying push and pull direction.
Exploiting output sparsity allows users to tell the backend which values of the
output in a single vectorized computation they do not want computed.
Load-balancing is an important feature for balancing work amongst parallel
workers. We describe the important load-balancing features for handling graphs
with different characteristics. The design principles described in this paper
have been implemented in "GraphBLAST", the first high-performance linear
algebra-based graph framework on NVIDIA GPUs that is open-source. The results
show that on a single GPU, GraphBLAST has on average at least an order of
magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL,
comparable performance to the fastest GPU hardwired primitives and
shared-memory graph frameworks Ligra and Gunrock, and better performance than
any other GPU graph framework, while offering a simpler and more concise
programming model.Comment: 50 pages, 14 figures, 14 table
A Tool for Programming Embarrassingly Task Parallel Applications on CoW and NoW
Embarrassingly parallel problems can be split in parts that are characterized
by a really low (or sometime absent) exchange of information during their
computation in parallel. As a consequence they can be effectively computed in
parallel exploiting commodity hardware, hence without particularly
sophisticated interconnection networks. Basically, this means Clusters,
Networks of Workstations and Desktops as well as Computational Clouds. Despite
the simplicity of this computational model, it can be exploited to compute a
quite large range of problems. This paper describes JJPF, a tool for developing
task parallel applications based on Java and Jini that showed to be an
effective and efficient solution in environment like Clusters and Networks of
Workstations and Desktops.Comment: 7 page
A Web Aggregation Approach for Distributed Randomized PageRank Algorithms
The PageRank algorithm employed at Google assigns a measure of importance to
each web page for rankings in search results. In our recent papers, we have
proposed a distributed randomized approach for this algorithm, where web pages
are treated as agents computing their own PageRank by communicating with linked
pages. This paper builds upon this approach to reduce the computation and
communication loads for the algorithms. In particular, we develop a method to
systematically aggregate the web pages into groups by exploiting the sparsity
inherent in the web. For each group, an aggregated PageRank value is computed,
which can then be distributed among the group members. We provide a distributed
update scheme for the aggregated PageRank along with an analysis on its
convergence properties. The method is especially motivated by results on
singular perturbation techniques for large-scale Markov chains and multi-agent
consensus.Comment: To appear in the IEEE Transactions on Automatic Control, 201
Random Surfing Without Teleportation
In the standard Random Surfer Model, the teleportation matrix is necessary to
ensure that the final PageRank vector is well-defined. The introduction of this
matrix, however, results in serious problems and imposes fundamental
limitations to the quality of the ranking vectors. In this work, building on
the recently proposed NCDawareRank framework, we exploit the decomposition of
the underlying space into blocks, and we derive easy to check necessary and
sufficient conditions for random surfing without teleportation.Comment: 13 pages. Published in the Volume: "Algorithms, Probability, Networks
and Games, Springer-Verlag, 2015". (The updated version corrects small
typos/errors
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
Adiabatic quantum algorithm for search engine ranking
We propose an adiabatic quantum algorithm for generating a quantum pure state
encoding of the PageRank vector, the most widely used tool in ranking the
relative importance of internet pages. We present extensive numerical
simulations which provide evidence that this algorithm can prepare the quantum
PageRank state in a time which, on average, scales polylogarithmically in the
number of webpages. We argue that the main topological feature of the
underlying web graph allowing for such a scaling is the out-degree
distribution. The top ranked entries of the quantum PageRank state
can then be estimated with a polynomial quantum speedup. Moreover, the quantum
PageRank state can be used in "q-sampling" protocols for testing properties of
distributions, which require exponentially fewer measurements than all
classical schemes designed for the same task. This can be used to decide
whether to run a classical update of the PageRank.Comment: 7 pages, 5 figures; closer to published versio
- …