43,143 research outputs found
A Label-based Edge Partitioning for Multi-Layer Graphs
Social network systems rely on very large underlying graphs. Consequently, to achieve scalability, most data analytics and data mining algorithms are distributed and graphs are partitioned over a set of servers. In most real-world graphs, the edges and/or vertices have different semantics and queries largely consider this semantics. But while several works focus on efficient graph computations on these “multi-semantic” graphs, few ones are dedicated to their partitioning. In this work, we propose a novel approach to achieve edge partitioning for multi-layer graphs, which considers both structural and edge-types (labels) localities. Our experiments on real life datasets with benchmark graph applications confirm that the execution time and the inter-partition communication can be significantly reduced with our approach
Distributed Graph Embedding with Information-Oriented Random Walks
Graph embedding maps graph nodes to low-dimensional vectors, and is widely
adopted in machine learning tasks. The increasing availability of billion-edge
graphs underscores the importance of learning efficient and effective
embeddings on large graphs, such as link prediction on Twitter with over one
billion edges. Most existing graph embedding methods fall short of reaching
high data scalability. In this paper, we present a general-purpose,
distributed, information-centric random walk-based graph embedding framework,
DistGER, which can scale to embed billion-edge graphs. DistGER incrementally
computes information-centric random walks. It further leverages a
multi-proximity-aware, streaming, parallel graph partitioning strategy,
simultaneously achieving high local partition quality and excellent workload
balancing across machines. DistGER also improves the distributed Skip-Gram
learning model to generate node embeddings by optimizing the access locality,
CPU throughput, and synchronization efficiency. Experiments on real-world
graphs demonstrate that compared to state-of-the-art distributed graph
embedding frameworks, including KnightKing, DistDGL, and Pytorch-BigGraph,
DistGER exhibits 2.33x-129x acceleration, 45% reduction in cross-machines
communication, and > 10% effectiveness improvement in downstream tasks
High-Quality Shared-Memory Graph Partitioning
Partitioning graphs into blocks of roughly equal size such that few edges run
between blocks is a frequently needed operation in processing graphs. Recently,
size, variety, and structural complexity of these networks has grown
dramatically. Unfortunately, previous approaches to parallel graph partitioning
have problems in this context since they often show a negative trade-off
between speed and quality. We present an approach to multi-level shared-memory
parallel graph partitioning that guarantees balanced solutions, shows high
speed-ups for a variety of large graphs and yields very good quality
independently of the number of cores used. For example, on 31 cores, our
algorithm partitions our largest test instance into 16 blocks cutting less than
half the number of edges than our main competitor when both algorithms are
given the same amount of time. Important ingredients include parallel label
propagation for both coarsening and improvement, parallel initial partitioning,
a simple yet effective approach to parallel localized local search, and fast
locality preserving hash tables
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
- …