59,305 research outputs found
Construction and Application of an AMR Algorithm for Distributed Memory Computers
While the parallelization of blockstructured adaptive mesh refinement techniques is relatively straight-forward on shared memory architectures, appropriate distribution strategies for the emerging generation of distributed
memory machines are a topic of on-going research. In this paper, a locality-preserving domain decomposition is proposed that partitions the entire AMR hierarchy from the base level on. It is shown that the approach reduces the
communication costs and simplifies the implementation. Emphasis is put on the effective parallelization of the flux correction procedure at coarse-fine boundaries, which is indispensable for conservative finite volume schemes. An
easily reproducible standard benchmark and a highly resolved parallel AMR
simulation of a diffracting hydrogen-oxygen detonation demonstrate the proposed
strategy in practice
Adaptive Partitioning for Large-Scale Dynamic Graphs
AbstractāIn the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the num-ber of edges that connect vertices assigned to different machines, while keeping the load balanced. However, real-world graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a large-scale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three real-world scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hash-partitioning. I
rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks
Scientific applications often contain large and computationally intensive
parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a
balanced load execution of such applications on high performance computing
(HPC) systems. Large HPC systems are vulnerable to processors or node failures
and perturbations in the availability of resources. Most self-scheduling
approaches do not consider fault-tolerant scheduling or depend on failure or
perturbation detection and react by rescheduling failed tasks. In this work, a
robust dynamic load balancing (rDLB) approach is proposed for the robust self
scheduling of independent tasks. The proposed approach is proactive and does
not depend on failure or perturbation detection. The theoretical analysis of
the proposed approach shows that it is linearly scalable and its cost decrease
quadratically by increasing the system size. rDLB is integrated into an MPI DLS
library to evaluate its performance experimentally with two computationally
intensive scientific applications. Results show that rDLB enables the tolerance
of up to (P minus one) processor failures, where P is the number of processors
executing an application. In the presence of perturbations, rDLB boosted the
robustness of DLS techniques up to 30 times and decreased application execution
time up to 7 times compared to their counterparts without rDLB
- ā¦