24 research outputs found

    Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

    Get PDF
    High performance computing using graphics processing units (GPUs) is gaining popularity in the scientific computing field, with many large compute clusters being augmented with multiple GPUs in each node. We investigate hybrid tri-level (MPI-OpenMP-CUDA) parallel implementations to explore the efficiency and scalability of incompressible flow computations on GPU clusters up to 128 GPUS. This work details some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism using OpenMP for intra-node and MPI for inter-node communication. Comparisons between the tri-level MPI-OpenMP-CUDA and dual-level MPI-CUDA implementations are shown using computationally large computational fluid dynamics (CFD) simulations. Our results demonstrate that a tri-level parallel implementation does not provide a significant advantage in performance over the dual-level implementation, however further research is needed to justify our conclusion for a cluster with a high GPU per node density or when using software that can utilize OpenMP’s fine-grain parallelism more effectively

    Multi-Level Parallelism for Incompressible Flow Computations on GPU Clusters

    Get PDF
    We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism that use either MPI or MPI-OpenMP for communications. We present three different strategies to overlap computations with communications, and systematically assess their impact on parallel performance on two different GPU clusters. Our results for strong and weak scaling analysis of incompressible flow computations demonstrate that GPU clusters offer significant benefits for large data sets, and a dual-level MPI-CUDA implementation with maximum overlapping of computation and communication provides substantial benefits in performance. We also find that our tri-level MPI-OpenMP-CUDA parallel implementation does not offer a significant advantage in performance over the dual-level implementation on GPU clusters with two GPUs per node, but on clusters with higher GPU counts per node or with different domain decomposition strategies a tri-level implementation may exhibit higher efficiency than a dual-level implementation and needs to be investigated further

    Aeronautical engineering: A continuing bibliography with indexes (supplement 317)

    Get PDF
    This bibliography lists 224 reports, articles, and other documents introduced into the NASA scientific and technical information system in May 1995. Subject coverage includes: design, construction and testing of aircraft and aircraft engines; aircraft components, equipment, and systems; ground support systems; and theoretical and applied aspects of aerodynamics and general fluid dynamics

    Summary of Research 1994

    Get PDF
    The views expressed in this report are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.This report contains 359 summaries of research projects which were carried out under funding of the Naval Postgraduate School Research Program. A list of recent publications is also included which consists of conference presentations and publications, books, contributions to books, published journal papers, and technical reports. The research was conducted in the areas of Aeronautics and Astronautics, Computer Science, Electrical and Computer Engineering, Mathematics, Mechanical Engineering, Meteorology, National Security Affairs, Oceanography, Operations Research, Physics, and Systems Management. This also includes research by the Command, Control and Communications (C3) Academic Group, Electronic Warfare Academic Group, Space Systems Academic Group, and the Undersea Warfare Academic Group

    The WWRP Polar Prediction Project (PPP)

    Get PDF
    Mission statement: “Promote cooperative international research enabling development of improved weather and environmental prediction services for the polar regions, on time scales from hours to seasonal”. Increased economic, transportation and research activities in polar regions are leading to more demands for sustained and improved availability of predictive weather and climate information to support decision-making. However, partly as a result of a strong emphasis of previous international efforts on lower and middle latitudes, many gaps in weather, sub-seasonal and seasonal forecasting in polar regions hamper reliable decision making in the Arctic, Antarctic and possibly the middle latitudes as well. In order to advance polar prediction capabilities, the WWRP Polar Prediction Project (PPP) has been established as one of three THORPEX (THe Observing System Research and Predictability EXperiment) legacy activities. The aim of PPP, a ten year endeavour (2013-2022), is to promote cooperative international research enabling development of improved weather and environmental prediction services for the polar regions, on hourly to seasonal time scales. In order to achieve its goals, PPP will enhance international and interdisciplinary collaboration through the development of strong linkages with related initiatives; strengthen linkages between academia, research institutions and operational forecasting centres; promote interactions and communication between research and stakeholders; and foster education and outreach. Flagship research activities of PPP include sea ice prediction, polar-lower latitude linkages and the Year of Polar Prediction (YOPP) - an intensive observational, coupled modelling, service-oriented research and educational effort in the period mid-2017 to mid-2019

    Aeronautical engineering: A continuing bibliography with indexes (supplement 266)

    Get PDF
    This bibliography lists 645 reports, articles, and other documents introduced into the NASA scientific and technical information system in May 1991. Subject coverage includes: design, construction and testing of aircraft and aircraft engines; aircraft components, equipment and systems; ground support systems; and theoretical and applied aspects of aerodynamics and general fluid dynamics

    DisPar Methods and Their Implementation on a Heterogeneous PC Cluster

    Get PDF
    Esta dissertação avalia duas áreas cruciais da simulação de advecção- difusão. A primeira parte é dedicada a estudos numéricos. Foi comprovado que existe uma relação directa entre os momentos de deslocamento de uma partícula de poluente e os erros de truncatura. Esta relação criou os fundamentos teóricos para criar uma nova família de métodos numéricos, DisPar. Foram introduzidos e avaliados três métodos. O primeiro é um método semi-Lagrangeano 2D baseado nos momentos de deslocamento de uma partícula para malhas regulares, DisPar-k. Com este método é possível controlar explicitamente o erro de truncatura desejado. O segundo método também se baseia nos momentos de deslocamento de uma partícula, sendo, contudo, desenvolvido para malhas uniformes não regulares, DisParV. Este método também apresentou uma forte robustez numérica. Ao contrário dos métodos DisPar-K e DisParV, o terceiro segue uma aproximação Eulereana com três regiões de destino da partícula. O método foi desenvolvido de forma a manter um perfil de concentração inicial homogéneo independentemente dos parâmetros usados. A comparação com o método DisPar-k em situações não lineares realçou as fortes limitações associadas aos métodos de advecção-difusão em cenários reais. A segunda parte da tese é dedicada à implementação destes métodos num Cluster de PCs heterogéneo. Para o fazer, foi desenvolvido um novo esquema de partição, AORDA. A aplicação, Scalable DisPar, foi implementada com a plataforma da Microsoft .Net, tendo sido totalmente escrita em C#. A aplicação foi testada no estuário do Tejo que se localiza perto de Lisboa, Portugal. Para superar os problemas de balanceamento de cargas provocados pelas marés, foram implementados diversos esquemas de partição: “Scatter Partitioning”, balanceamento dinâmico de cargas e uma mistura de ambos. Pelos testes elaborados, foi possível verificar que o número de máquinas vizinhas se apresentou como o mais limitativo em termos de escalabilidade, mesmo utilizando comunicações assíncronas. As ferramentas utilizadas para as comunicações foram a principal causa deste fenómeno. Aparentemente, o Microsoft .Net remoting 1.0 não funciona de forma apropriada nos ambientes de concorrência criados pelas comunicações assíncronas. Este facto não permitiu a obtenção de conclusões acerca dos níveis relativos de escalabilidade das diferentes estratégias de partição utilizadas. No entanto, é fortemente sugerido que a melhor estratégia irá ser “Scatter Partitioning” associada a balanceamento dinâmico de cargas e a comunicações assíncronas. A técnica de “Scatter Partitioning” mitiga os problemas de desbalanceamentos de cargas provocados pelas marés. Por outro lado, o balanceamento dinâmico será essencialmente activado no inicio da simulação para corrigir possíveis problemas nas previsões dos poderes de cada processador.This thesis assesses two main areas of the advection-diffusion simulation. The first part is dedicated to the numerical studies. It has been proved that there is a direct relation between pollutant particle displacement moments and truncation errors. This relation raised the theoretical foundations to create a new family of numerical methods, DisPar. Three methods have been introduced and appraised. The first is a 2D semi- Lagrangian method based on particle displacement moments for regular grids, DisPar-k. With this method one can explicitly control the desired truncation error. The second method is also based on particle displacement moments but it is targeted to regular/non-uniform grids, DisParV. The method has also shown a strong numerical capacity. Unlike DisPar-k and DisParV, the third method is a Eulerian approximation for three particle destination units. The method was developed so that an initial concentration profile will be kept homogeneous independently of the used parameters. The comparison with DisPar-k in non-linear situations has emphasized the strong shortcomings associated with numerical methods for advection-diffusion in real scenarios. The second part of the dissertation is dedicated to the implementation of these methods in a heterogeneous PC Cluster. To do so, a new partitioning method has been developed, AORDA. The application, Scalable DisPar, was implemented with the Microsoft .Net framework and was totally written in C#. The application was tested on the Tagus Estuary, near Lisbon (Portugal). To overcome the load imbalances caused by tides scatter partitioning was implemented, dynamic load balancing and a mix of both. By the tests made, it was possible to verify that the number of neighboring machines was the main factor affecting the application scalability, even with asynchronous communications. The tools used for communications mainly caused this. Microsoft .Net remoting 1.0 does not seem to properly work in environments with concurrency associated with the asynchronous communications. This did not allow taking conclusions about the relative efficiency between the partitioning strategies used. However, it is strongly suggested that the best approach will be to scatter partitioning with dynamic load balancing and with asynchronous communications. Scatter partitioning mitigates load imbalances caused by tides and dynamic load balancing is basically trigged at the begging of the simulation to correct possible problems in processor power predictions

    Reliability analysis of flood defence structures and systems in Europe

    Full text link
    corecore