9 research outputs found
Optimizing an MPI weather forecasting model via processor virtualization
Abstract—Weather forecasting models are computationally intensive applications. These models are typically executed in parallel machines and a major obstacle for their scalability is load imbalance. The causes of such imbalance are either static (e.g. topography) or dynamic (e.g. shortwave radiation, moving thunderstorms). Various techniques, often embedded in the application’s source code, have been used to address both sources. However, these techniques are inflexible and hard to use in legacy codes. In this paper, we demonstrate the effectiveness of processor virtualization for dynamically balancing the load in BRAMS, a mesoscale weather forecasting model based on MPI paral-lelization. We use the Charm++ infrastructure, with its over-decomposition and object-migration capabilities, to move sub-domains across processors during execution of the model. Pro-cessor virtualization enables better overlap between computation and communication and improved cache efficiency. Furthermore, by employing an appropriate load balancer, we achieve better processor utilization while requiring minimal changes to the model’s code. I
Dynamic Load Balancing for Compressible Multiphase Turbulence
CMT-nek is a new scientific application for performing high fidelity
predictive simulations of particle laden explosively dispersed turbulent flows.
CMT-nek involves detailed simulations, is compute intensive and is targeted to
be deployed on exascale platforms. The moving particles are the main source of
load imbalance as the application is executed on parallel processors. In a
demonstration problem, all the particles are initially in a closed container
until a detonation occurs and the particles move apart. If all processors get
an equal share of the fluid domain, then only some of the processors get
sections of the domain that are initially laden with particles, leading to
disparate load on the processors. In order to eliminate load imbalance in
different processors and to speedup the makespan, we present different load
balancing algorithms for CMT-nek on large scale multi-core platforms consisting
of hundred of thousands of cores. The detailed process of the load balancing
algorithms are presented. The performance of the different load balancing
algorithms are compared and the associated overheads are analyzed. Evaluations
on the application with and without load balancing are conducted and these show
that with load balancing, simulation time becomes faster by a factor of up to
.Comment: This paper has been accepted by ACM International Conference on
Supercomputing (ICS) 201
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers
Abstract — Large parallel machines with hundreds of thou-sands of processors are being built. Recent studies have shown that ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to yield poor load balance on very large machines. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scala-bility challenges of centralized schemes and poor solutions of traditional distributed schemes. This is done by creating multiple levels of aggressive load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We present techniques to deal with scalability challenges of load balancing at very large scale. We show performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at TACC) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD with results on the Blue Gene/P machine at ANL. I
Predicting application performance using supervised learning on communication features
Abstract not provide
Lastbalancierungsverfahren für dynamische und heterogene Linked-Cell Molekülsimulation
In dieser Arbeit wird die Lastbalancierung von Molekül- beziehungsweise Teilchensimulationen mit kurzreichweitigen Potenzialen betrachtet. Eine solche ist notwendig, um inhomogene Szenarien effizient über längere Zeiträume hinweg auf Parallelrechnern simulieren zu können. Hierzu werden die vorkommenden Arten von Last analysiert und in sogenannten Lastmodellen quantifiziert. Hierbei liegt der Fokus auf Rechen- und Kommunikationslasten. Anschließend wird das Problem der Lastbalancierung beschrieben. Es werden verschiedene in der Literatur bekannte Verfahren zur Lastbalancierung betrachtet, untersucht und evaluiert. Der Fokus liegt hierbei nicht auf einem einzelnen Anwendungsszenario, sondern auf der generellen Machbarkeit, den Eigenschaften und den Einschränkungen der jeweiligen Verfahren
Dynamic load balancing of parallel road traffic simulation
The objective of this research was to investigate, develop and evaluate dynamic load-balancing strategies for parallel execution of microscopic road traffic simulations. Urban road traffic simulation presents irregular, and dynamically varying distributed computational load for a parallel processor system. The dynamic nature of road traffic simulation systems lead to uneven load distribution during simulation, even for a system that starts off with even load distributions. Load balancing is a potential way of achieving improved performance by reallocating work from highly loaded processors to lightly loaded processors leading to a reduction in the overall computational time. In dynamic load balancing, workloads are adjusted continually or periodically throughout the computation. In this thesis load balancing strategies were evaluated and some load balancing policies developed. A load index and a profitability determination algorithms were developed. These were used to enhance two load balancing algorithms. One of the algorithms exhibits local communications and distributed load evaluation between the neighbour partitions (diffusion algorithm) and the other algorithm exhibits both local and global communications while the decision making is centralized (MaS algorithm). The enhanced algorithms were implemented and synthesized with a research parallel traffic simulation. The performance of the research parallel traffic simulator, optimized with the two modified dynamic load balancing strategies were studied.EThOS - Electronic Theses Online ServiceGBUnited Kingdo