1,082 research outputs found
Bringing UMAP Closer to the Speed of Light with GPU Acceleration
The Uniform Manifold Approximation and Projection (UMAP) algorithm has become
widely popular for its ease of use, quality of results, and support for
exploratory, unsupervised, supervised, and semi-supervised learning. While many
algorithms can be ported to a GPU in a simple and direct fashion, such efforts
have resulted in inefficient and inaccurate versions of UMAP. We show a number
of techniques that can be used to make a faster and more faithful GPU version
of UMAP, and obtain speedups of up to 100x in practice. Many of these design
choices/lessons are general purpose and may inform the conversion of other
graph and manifold learning algorithms to use GPUs. Our implementation has been
made publicly available as part of the open source RAPIDS cuML library
(https://github.com/rapidsai/cuml)
GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding
Learning continuous representations of nodes is attracting growing interest
in both academia and industry recently, due to their simplicity and
effectiveness in a variety of applications. Most of existing node embedding
algorithms and systems are capable of processing networks with hundreds of
thousands or a few millions of nodes. However, how to scale them to networks
that have tens of millions or even hundreds of millions of nodes remains a
challenging problem. In this paper, we propose GraphVite, a high-performance
CPU-GPU hybrid system for training node embeddings, by co-optimizing the
algorithm and the system. On the CPU end, augmented edge samples are parallelly
generated by random walks in an online fashion on the network, and serve as the
training data. On the GPU end, a novel parallel negative sampling is proposed
to leverage multiple GPUs to train node embeddings simultaneously, without much
data transfer and synchronization. Moreover, an efficient collaboration
strategy is proposed to further reduce the synchronization cost between CPUs
and GPUs. Experiments on multiple real-world networks show that GraphVite is
super efficient. It takes only about one minute for a network with 1 million
nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20
hours for a network with 66 million nodes and 1.8 billion edges. Compared to
the current fastest system, GraphVite is about 50 times faster without any
sacrifice on performance.Comment: accepted at WWW 201
- …