24 research outputs found
Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism
High performance computing using graphics processing units (GPUs) is gaining popularity in the scientific computing field, with many large compute clusters being augmented with multiple GPUs in each node. We investigate hybrid tri-level (MPI-OpenMP-CUDA) parallel implementations to explore the efficiency and scalability of incompressible flow computations on GPU clusters up to 128 GPUS. This work details some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism using OpenMP for intra-node and MPI for inter-node communication. Comparisons between the tri-level MPI-OpenMP-CUDA and dual-level MPI-CUDA implementations are shown using computationally large computational fluid dynamics (CFD) simulations. Our results demonstrate that a tri-level parallel implementation does not provide a significant advantage in performance over the dual-level implementation, however further research is needed to justify our conclusion for a cluster with a high GPU per node density or when using software that can utilize OpenMP’s fine-grain parallelism more effectively
Multi-Level Parallelism for Incompressible Flow Computations on GPU Clusters
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism that use either MPI or MPI-OpenMP for communications. We present three different strategies to overlap computations with communications, and systematically assess their impact on parallel performance on two different GPU clusters. Our results for strong and weak scaling analysis of incompressible flow computations demonstrate that GPU clusters offer significant benefits for large data sets, and a dual-level MPI-CUDA implementation with maximum overlapping of computation and communication provides substantial benefits in performance. We also find that our tri-level MPI-OpenMP-CUDA parallel implementation does not offer a significant advantage in performance over the dual-level implementation on GPU clusters with two GPUs per node, but on clusters with higher GPU counts per node or with different domain decomposition strategies a tri-level implementation may exhibit higher efficiency than a dual-level implementation and needs to be investigated further
Aeronautical engineering: A continuing bibliography with indexes (supplement 317)
This bibliography lists 224 reports, articles, and other documents introduced into the NASA scientific and technical information system in May 1995. Subject coverage includes: design, construction and testing of aircraft and aircraft engines; aircraft components, equipment, and systems; ground support systems; and theoretical and applied aspects of aerodynamics and general fluid dynamics
Summary of Research 1994
The views expressed in this report are those of the authors and do not reflect the
official policy or position of the Department of Defense or the U.S. Government.This report contains 359 summaries of research projects which were carried out
under funding of the Naval Postgraduate School Research Program. A list of recent
publications is also included which consists of conference presentations and
publications, books, contributions to books, published journal papers, and
technical reports. The research was conducted in the areas of Aeronautics and
Astronautics, Computer Science, Electrical and Computer Engineering, Mathematics,
Mechanical Engineering, Meteorology, National Security Affairs, Oceanography,
Operations Research, Physics, and Systems Management. This also includes research
by the Command, Control and Communications (C3) Academic Group, Electronic Warfare
Academic Group, Space Systems Academic Group, and the Undersea Warfare Academic
Group
The WWRP Polar Prediction Project (PPP)
Mission statement: “Promote cooperative international research enabling development of improved weather and environmental prediction services for the polar regions, on time scales from hours to seasonal”. Increased economic, transportation and research activities in polar regions are leading to more demands for sustained and improved availability of predictive weather and climate information to support decision-making. However, partly as a result of a strong emphasis of previous international efforts on lower and middle latitudes, many gaps in weather, sub-seasonal and seasonal forecasting in polar regions hamper reliable decision making in the Arctic, Antarctic and possibly the middle latitudes as well.
In order to advance polar prediction capabilities, the WWRP Polar Prediction Project (PPP) has been established as one of three THORPEX (THe Observing System Research and Predictability EXperiment) legacy activities. The aim of PPP, a ten year endeavour (2013-2022), is to promote cooperative international research enabling development of improved weather and environmental prediction services for the polar regions, on hourly to seasonal time scales. In order to achieve its goals, PPP will enhance international and interdisciplinary collaboration through the development of strong linkages with related initiatives; strengthen linkages between academia, research institutions and operational forecasting centres; promote interactions and communication between research and stakeholders; and foster education and outreach.
Flagship research activities of PPP include sea ice prediction, polar-lower latitude linkages and the Year of Polar Prediction (YOPP) - an intensive observational, coupled modelling, service-oriented research and educational effort in the period mid-2017 to mid-2019
Aeronautical engineering: A continuing bibliography with indexes (supplement 266)
This bibliography lists 645 reports, articles, and other documents introduced into the NASA scientific and technical information system in May 1991. Subject coverage includes: design, construction and testing of aircraft and aircraft engines; aircraft components, equipment and systems; ground support systems; and theoretical and applied aspects of aerodynamics and general fluid dynamics
DisPar Methods and Their Implementation on a Heterogeneous PC Cluster
Esta dissertação avalia duas áreas cruciais da simulação de advecção-
difusão.
A primeira parte é dedicada a estudos numéricos. Foi comprovado que
existe uma relação directa entre os momentos de deslocamento de uma partícula
de poluente e os erros de truncatura. Esta relação criou os fundamentos teóricos
para criar uma nova família de métodos numéricos, DisPar.
Foram introduzidos e avaliados três métodos. O primeiro é um método
semi-Lagrangeano 2D baseado nos momentos de deslocamento de uma partícula
para malhas regulares, DisPar-k. Com este método é possível controlar
explicitamente o erro de truncatura desejado. O segundo método também se
baseia nos momentos de deslocamento de uma partícula, sendo, contudo,
desenvolvido para malhas uniformes não regulares, DisParV. Este método
também apresentou uma forte robustez numérica. Ao contrário dos métodos
DisPar-K e DisParV, o terceiro segue uma aproximação Eulereana com três
regiões de destino da partícula. O método foi desenvolvido de forma a manter um
perfil de concentração inicial homogéneo independentemente dos parâmetros
usados. A comparação com o método DisPar-k em situações não lineares realçou
as fortes limitações associadas aos métodos de advecção-difusão em cenários
reais.
A segunda parte da tese é dedicada à implementação destes métodos num
Cluster de PCs heterogéneo. Para o fazer, foi desenvolvido um novo esquema de
partição, AORDA. A aplicação, Scalable DisPar, foi implementada com a
plataforma da Microsoft .Net, tendo sido totalmente escrita em C#. A aplicação foi
testada no estuário do Tejo que se localiza perto de Lisboa, Portugal.
Para superar os problemas de balanceamento de cargas provocados pelas
marés, foram implementados diversos esquemas de partição: “Scatter
Partitioning”, balanceamento dinâmico de cargas e uma mistura de ambos. Pelos
testes elaborados, foi possível verificar que o número de máquinas vizinhas se
apresentou como o mais limitativo em termos de escalabilidade, mesmo utilizando
comunicações assíncronas. As ferramentas utilizadas para as comunicações
foram a principal causa deste fenómeno. Aparentemente, o Microsoft .Net remoting 1.0 não funciona de forma apropriada nos ambientes de concorrência
criados pelas comunicações assíncronas. Este facto não permitiu a obtenção de
conclusões acerca dos níveis relativos de escalabilidade das diferentes
estratégias de partição utilizadas. No entanto, é fortemente sugerido que a melhor
estratégia irá ser “Scatter Partitioning” associada a balanceamento dinâmico de
cargas e a comunicações assíncronas. A técnica de “Scatter Partitioning” mitiga
os problemas de desbalanceamentos de cargas provocados pelas marés. Por
outro lado, o balanceamento dinâmico será essencialmente activado no inicio da
simulação para corrigir possíveis problemas nas previsões dos poderes de cada
processador.This thesis assesses two main areas of the advection-diffusion simulation.
The first part is dedicated to the numerical studies. It has been proved that
there is a direct relation between pollutant particle displacement moments and
truncation errors. This relation raised the theoretical foundations to create a new
family of numerical methods, DisPar.
Three methods have been introduced and appraised. The first is a 2D semi-
Lagrangian method based on particle displacement moments for regular grids,
DisPar-k. With this method one can explicitly control the desired truncation error.
The second method is also based on particle displacement moments but it is
targeted to regular/non-uniform grids, DisParV. The method has also shown a
strong numerical capacity. Unlike DisPar-k and DisParV, the third method is a
Eulerian approximation for three particle destination units. The method was
developed so that an initial concentration profile will be kept homogeneous
independently of the used parameters. The comparison with DisPar-k in non-linear
situations has emphasized the strong shortcomings associated with numerical
methods for advection-diffusion in real scenarios.
The second part of the dissertation is dedicated to the implementation of
these methods in a heterogeneous PC Cluster. To do so, a new partitioning
method has been developed, AORDA. The application, Scalable DisPar, was
implemented with the Microsoft .Net framework and was totally written in C#. The
application was tested on the Tagus Estuary, near Lisbon (Portugal).
To overcome the load imbalances caused by tides scatter partitioning was
implemented, dynamic load balancing and a mix of both. By the tests made, it was
possible to verify that the number of neighboring machines was the main factor
affecting the application scalability, even with asynchronous communications. The
tools used for communications mainly caused this. Microsoft .Net remoting 1.0
does not seem to properly work in environments with concurrency associated with
the asynchronous communications. This did not allow taking conclusions about the
relative efficiency between the partitioning strategies used. However, it is strongly
suggested that the best approach will be to scatter partitioning with dynamic load
balancing and with asynchronous communications. Scatter partitioning mitigates
load imbalances caused by tides and dynamic load balancing is basically trigged
at the begging of the simulation to correct possible problems in processor power
predictions