11,447 research outputs found
GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding
Learning continuous representations of nodes is attracting growing interest
in both academia and industry recently, due to their simplicity and
effectiveness in a variety of applications. Most of existing node embedding
algorithms and systems are capable of processing networks with hundreds of
thousands or a few millions of nodes. However, how to scale them to networks
that have tens of millions or even hundreds of millions of nodes remains a
challenging problem. In this paper, we propose GraphVite, a high-performance
CPU-GPU hybrid system for training node embeddings, by co-optimizing the
algorithm and the system. On the CPU end, augmented edge samples are parallelly
generated by random walks in an online fashion on the network, and serve as the
training data. On the GPU end, a novel parallel negative sampling is proposed
to leverage multiple GPUs to train node embeddings simultaneously, without much
data transfer and synchronization. Moreover, an efficient collaboration
strategy is proposed to further reduce the synchronization cost between CPUs
and GPUs. Experiments on multiple real-world networks show that GraphVite is
super efficient. It takes only about one minute for a network with 1 million
nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20
hours for a network with 66 million nodes and 1.8 billion edges. Compared to
the current fastest system, GraphVite is about 50 times faster without any
sacrifice on performance.Comment: accepted at WWW 201
High-Performance Reachability Query Processing under Index Size Restrictions
In this paper, we propose a scalable and highly efficient index structure for
the reachability problem over graphs. We build on the well-known node interval
labeling scheme where the set of vertices reachable from a particular node is
compactly encoded as a collection of node identifier ranges. We impose an
explicit bound on the size of the index and flexibly assign approximate
reachability ranges to nodes of the graph such that the number of index probes
to answer a query is minimized. The resulting tunable index structure generates
a better range labeling if the space budget is increased, thus providing a
direct control over the trade off between index size and the query processing
performance. By using a fast recursive querying method in conjunction with our
index structure, we show that in practice, reachability queries can be answered
in the order of microseconds on an off-the-shelf computer - even for the case
of massive-scale real world graphs. Our claims are supported by an extensive
set of experimental results using a multitude of benchmark and real-world
web-scale graph datasets.Comment: 30 page
GraphR: Accelerating Graph Processing Using ReRAM
This paper presents GRAPHR, the first ReRAM-based graph processing
accelerator. GRAPHR follows the principle of near-data processing and explores
the opportunity of performing massive parallel analog operations with low
hardware and energy cost. The analog computation is suit- able for graph
processing because: 1) The algorithms are iterative and could inherently
tolerate the imprecision; 2) Both probability calculation (e.g., PageRank and
Collaborative Filtering) and typical graph algorithms involving integers (e.g.,
BFS/SSSP) are resilient to errors. The key insight of GRAPHR is that if a
vertex program of a graph algorithm can be expressed in sparse matrix vector
multiplication (SpMV), it can be efficiently performed by ReRAM crossbar. We
show that this assumption is generally true for a large set of graph
algorithms. GRAPHR is a novel accelerator architecture consisting of two
components: memory ReRAM and graph engine (GE). The core graph computations are
performed in sparse matrix format in GEs (ReRAM crossbars). The
vector/matrix-based graph computation is not new, but ReRAM offers the unique
opportunity to realize the massive parallelism with unprecedented energy
efficiency and low hardware cost. With small subgraphs processed by GEs, the
gain of performing parallel operations overshadows the wastes due to sparsity.
The experiment results show that GRAPHR achieves a 16.01x (up to 132.67x)
speedup and a 33.82x energy saving on geometric mean compared to a CPU baseline
system. Com- pared to GPU, GRAPHR achieves 1.69x to 2.19x speedup and consumes
4.77x to 8.91x less energy. GRAPHR gains a speedup of 1.16x to 4.12x, and is
3.67x to 10.96x more energy efficiency compared to PIM-based architecture.Comment: Accepted to HPCA 201
The Topology ToolKit
This system paper presents the Topology ToolKit (TTK), a software platform
designed for topological data analysis in scientific visualization. TTK
provides a unified, generic, efficient, and robust implementation of key
algorithms for the topological analysis of scalar data, including: critical
points, integral lines, persistence diagrams, persistence curves, merge trees,
contour trees, Morse-Smale complexes, fiber surfaces, continuous scatterplots,
Jacobi sets, Reeb spaces, and more. TTK is easily accessible to end users due
to a tight integration with ParaView. It is also easily accessible to
developers through a variety of bindings (Python, VTK/C++) for fast prototyping
or through direct, dependence-free, C++, to ease integration into pre-existing
complex systems. While developing TTK, we faced several algorithmic and
software engineering challenges, which we document in this paper. In
particular, we present an algorithm for the construction of a discrete gradient
that complies to the critical points extracted in the piecewise-linear setting.
This algorithm guarantees a combinatorial consistency across the topological
abstractions supported by TTK, and importantly, a unified implementation of
topological data simplification for multi-scale exploration and analysis. We
also present a cached triangulation data structure, that supports time
efficient and generic traversals, which self-adjusts its memory usage on demand
for input simplicial meshes and which implicitly emulates a triangulation for
regular grids with no memory overhead. Finally, we describe an original
software architecture, which guarantees memory efficient and direct accesses to
TTK features, while still allowing for researchers powerful and easy bindings
and extensions. TTK is open source (BSD license) and its code, online
documentation and video tutorials are available on TTK's website
- …