13,627 research outputs found
GiViP: A Visual Profiler for Distributed Graph Processing Systems
Analyzing large-scale graphs provides valuable insights in different
application scenarios. While many graph processing systems working on top of
distributed infrastructures have been proposed to deal with big graphs, the
tasks of profiling and debugging their massive computations remain time
consuming and error-prone. This paper presents GiViP, a visual profiler for
distributed graph processing systems based on a Pregel-like computation model.
GiViP captures the huge amount of messages exchanged throughout a computation
and provides an interactive user interface for the visual analysis of the
collected data. We show how to take advantage of GiViP to detect anomalies
related to the computation and to the infrastructure, such as slow computing
units and anomalous message patterns.Comment: Appears in the Proceedings of the 25th International Symposium on
Graph Drawing and Network Visualization (GD 2017
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Multi-GPU Graph Analytics
We present a single-node, multi-GPU programmable graph processing library
that allows programmers to easily extend single-GPU graph algorithms to achieve
scalable performance on large graphs with billions of edges. Directly using the
single-GPU implementations, our design only requires programmers to specify a
few algorithm-dependent concerns, hiding most multi-GPU related implementation
details. We analyze the theoretical and practical limits to scalability in the
context of varying graph primitives and datasets. We describe several
optimizations, such as direction optimizing traversal, and a just-enough memory
allocation scheme, for better performance and smaller memory consumption.
Compared to previous work, we achieve best-of-class performance across
operations and datasets, including excellent strong and weak scalability on
most primitives as we increase the number of GPUs in the system.Comment: 12 pages. Final version submitted to IPDPS 201
A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing
Compared to traditional distributed computing environments such as grids,
cloud computing provides a more cost-effective way to deploy scientific
workflows. Each task of a scientific workflow requires several large datasets
that are located in different datacenters from the cloud computing environment,
resulting in serious data transmission delays. Edge computing reduces the data
transmission delays and supports the fixed storing manner for scientific
workflow private datasets, but there is a bottleneck in its storage capacity.
It is a challenge to combine the advantages of both edge computing and cloud
computing to rationalize the data placement of scientific workflow, and
optimize the data transmission time across different datacenters. Traditional
data placement strategies maintain load balancing with a given number of
datacenters, which results in a large data transmission time. In this study, a
self-adaptive discrete particle swarm optimization algorithm with genetic
algorithm operators (GA-DPSO) was proposed to optimize the data transmission
time when placing data for a scientific workflow. This approach considered the
characteristics of data placement combining edge computing and cloud computing.
In addition, it considered the impact factors impacting transmission delay,
such as the band-width between datacenters, the number of edge datacenters, and
the storage capacity of edge datacenters. The crossover operator and mutation
operator of the genetic algorithm were adopted to avoid the premature
convergence of the traditional particle swarm optimization algorithm, which
enhanced the diversity of population evolution and effectively reduced the data
transmission time. The experimental results show that the data placement
strategy based on GA-DPSO can effectively reduce the data transmission time
during workflow execution combining edge computing and cloud computing
- …