321 research outputs found
Get More for Less in Decentralized Learning Systems
Decentralized learning (DL) systems have been gaining popularity because they
avoid raw data sharing by communicating only model parameters, hence preserving
data confidentiality. However, the large size of deep neural networks poses a
significant challenge for decentralized training, since each node needs to
exchange gigabytes of data, overloading the network. In this paper, we address
this challenge with JWINS, a communication-efficient and fully decentralized
learning system that shares only a subset of parameters through sparsification.
JWINS uses wavelet transform to limit the information loss due to
sparsification and a randomized communication cut-off that reduces
communication usage without damaging the performance of trained models. We
demonstrate empirically with 96 DL nodes on non-IID datasets that JWINS can
achieve similar accuracies to full-sharing DL while sending up to 64% fewer
bytes. Additionally, on low communication budgets, JWINS outperforms the
state-of-the-art communication-efficient DL algorithm CHOCO-SGD by up to 4x in
terms of network savings and time
NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization
We study the problem of large-scale network embedding, which aims to learn
latent representations for network mining applications. Previous research shows
that 1) popular network embedding benchmarks, such as DeepWalk, are in essence
implicitly factorizing a matrix with a closed form, and 2)the explicit
factorization of such matrix generates more powerful embeddings than existing
methods. However, directly constructing and factorizing this matrix---which is
dense---is prohibitively expensive in terms of both time and space, making it
not scalable for large networks.
In this work, we present the algorithm of large-scale network embedding as
sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral
sparsification to efficiently sparsify the aforementioned dense matrix,
enabling significantly improved efficiency in embedding learning. The
sparsified matrix is spectrally close to the original dense one with a
theoretically bounded approximation error, which helps maintain the
representation power of the learned embeddings. We conduct experiments on
networks of various scales and types. Results show that among both popular
benchmarks and factorization based methods, NetSMF is the only method that
achieves both high efficiency and effectiveness. We show that NetSMF requires
only 24 hours to generate effective embeddings for a large-scale academic
collaboration network with tens of millions of nodes, while it would cost
DeepWalk months and is computationally infeasible for the dense matrix
factorization solution. The source code of NetSMF is publicly available
(https://github.com/xptree/NetSMF).Comment: 11 pages, in Proceedings of the Web Conference 2019 (WWW 19
Rethinking Efficiency and Redundancy in Training Large-scale Graphs
Large-scale graphs are ubiquitous in real-world scenarios and can be trained
by Graph Neural Networks (GNNs) to generate representation for downstream
tasks. Given the abundant information and complex topology of a large-scale
graph, we argue that redundancy exists in such graphs and will degrade the
training efficiency. Unfortunately, the model scalability severely restricts
the efficiency of training large-scale graphs via vanilla GNNs. Despite recent
advances in sampling-based training methods, sampling-based GNNs generally
overlook the redundancy issue. It still takes intolerable time to train these
models on large-scale graphs. Thereby, we propose to drop redundancy and
improve efficiency of training large-scale graphs with GNNs, by rethinking the
inherent characteristics in a graph.
In this paper, we pioneer to propose a once-for-all method, termed DropReef,
to drop the redundancy in large-scale graphs. Specifically, we first conduct
preliminary experiments to explore potential redundancy in large-scale graphs.
Next, we present a metric to quantify the neighbor heterophily of all nodes in
a graph. Based on both experimental and theoretical analysis, we reveal the
redundancy in a large-scale graph, i.e., nodes with high neighbor heterophily
and a great number of neighbors. Then, we propose DropReef to detect and drop
the redundancy in large-scale graphs once and for all, helping reduce the
training time while ensuring no sacrifice in the model accuracy. To demonstrate
the effectiveness of DropReef, we apply it to recent state-of-the-art
sampling-based GNNs for training large-scale graphs, owing to the high
precision of such models. With DropReef leveraged, the training efficiency of
models can be greatly promoted. DropReef is highly compatible and is offline
performed, benefiting the state-of-the-art sampling-based GNNs in the present
and future to a significant extent.Comment: 11 Page
Multilevel Methods for Sparsification and Linear Arrangement Problems on Networks
The computation of network properties such as diameter, centrality indices, and paths on networks may become a major bottleneck in the analysis of network if the network is large. Scalable approximation algorithms, heuristics and structure preserving network sparsification methods play an important role in modern network analysis. In the first part of this thesis, we develop a robust network sparsification method that enables filtering of either, so called, long- and short-range edges or both. Edges are first ranked by their algebraic distances and then sampled. Furthermore, we also combine this method with a multilevel framework to provide a multilevel sparsification framework that can control the sparsification process at different coarse-grained resolutions. Experimental results demonstrate an effectiveness of the proposed methods without significant loss in a quality of computed network properties. In the second part of the thesis, we introduce asymmetric coarsening schemes for multilevel algorithms developed for linear arrangement problems. Effectiveness of the set of coarse variables, and the corresponding interpolation matrix is the central problem in any multigrid algorithm. We are pushing the boundaries of fast maximum weighted matching algorithms for coarsening schemes on graphs by introducing novel ideas for asymmetric coupling between coarse and fine variables of the problem
- …