7 research outputs found

    A Scalable and Modular Software Architecture for Finite Elements on Hierarchical Hybrid Grids

    Full text link
    In this article, a new generic higher-order finite-element framework for massively parallel simulations is presented. The modular software architecture is carefully designed to exploit the resources of modern and future supercomputers. Combining an unstructured topology with structured grid refinement facilitates high geometric adaptability and matrix-free multigrid implementations with excellent performance. Different abstraction levels and fully distributed data structures additionally ensure high flexibility, extensibility, and scalability. The software concepts support sophisticated load balancing and flexibly combining finite element spaces. Example scenarios with coupled systems of PDEs show the applicability of the concepts to performing geophysical simulations.Comment: Preprint of an article submitted to International Journal of Parallel, Emergent and Distributed Systems (Taylor & Francis

    GPU Communication Performance Engineering for the Lattice Boltzmann Method

    Get PDF
    Orientador : Prof. Dr. Daniel WeingaertnerDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 10/08/2016Inclui referências : f. 59-62Área de concentração: Ciência da computaçãoResumo: A crescente importância do uso de GPUs para computação de propósito geral em supercomputadores faz com que o bom suporte a GPUs seja uma característica valiosa de frameworks de software para computação de alto desempenho como o waLBerla. waLBerla é um framework de software altamente paralelo que suporta uma ampla gama de fenômenos físicos. Embora apresente um bom desempenho em CPUs, testes demonstraram que as suas soluções de comunicação para GPU têm um desempenho ruim. Neste trabalho são apresentadas soluções para melhorar o desempenho, a eficiência do uso de memória e a usabilidade do waLBerla em supercomputadores baseados em GPU. A infraestrutura de comunicação proposta para GPUs NVIDIA com suporte a CUDA mostrou-se 25 vezes mais rápida do que o mecanismo de comunicação para GPU disponíveis anteriormente no waLBerla. Nossa solução para melhorar a eficiência do uso de memória da GPU permite usar 55% da memória necessária por uma abordagem simplista, o que possibilita executar simulações com domínios maiores ou usar menos GPUs para um determinado tamanho de domínio. Adicionalmente, levando-se em consideração que o desempenho de kernels CUDA se mostrou altamente sensível ao modo como a memória da GPU é acessada e a detalhes de implementação, foi proposto um mecanismo de indexação flexível de domínio que permite configurar as dimensões dos blocos de threads. Além disso, uma aplicação do Lattice Boltzmann Method (LBM) foi desenvolvida com kernels CUDA altamente otimizados a fim de se realizar todos os experimentos e testar todas as soluções propostas para o waLBerla. Palavras-chave: HPC, GPU, CUDA, Comunicação, Memória, Lattice Boltzmann Method, waLBerla.Abstract: The increasing importance of GPUs for general-purpose computation on supercomputers makes a good GPU support by High-Performance Computing (HPC) software frameworks such as waLBerla a valuable feature. waLBerla is a massively parallel software framework that supports a wide range of physical phenomena. Although it presents good performance on CPUs, tests have shown that its available GPU communication solutions perform poorly. In this work, we present solutions for improving waLBerla's performance, memory usage e_ciency and usability on GPUbased supercomputers. The proposed communication infrastructure for CUDA-enabled NVIDIA GPUs executed 25 times faster than the GPU communication mechanism previously available on waLBerla. Our solution for improving GPU memory usage e_ciency allowed for using 55% of the memory required by a naive approach, which makes possible for running simulations with larger domains or using fewer GPUs for a given domain size. In addition, as CUDA kernel performance showed to be very sensitive to the way data is accessed in GPU memory and kernel implementation details, we proposed a flexible domain indexing mechanism that allows for configuring thread block sizes. Finally, a Lattice Boltzmann Method (LBM) application was developed with highly optimized CUDA kernels in order to carry out all experiments and test all proposed solutions for waLBerla. Keywords: HPC, GPU, CUDA, Communication, Memory, Lattice Boltzmann Method, waLBerla

    Generating and auto-tuning parallel stencil codes

    Get PDF
    In this thesis, we present a software framework, Patus, which generates high performance stencil codes for different types of hardware platforms, including current multicore CPU and graphics processing unit architectures. The ultimate goals of the framework are productivity, portability (of both the code and performance), and achieving a high performance on the target platform. A stencil computation updates every grid point in a structured grid based on the values of its neighboring points. This class of computations occurs frequently in scientific and general purpose computing (e.g., in partial differential equation solvers or in image processing), justifying the focus on this kind of computation. The proposed key ingredients to achieve the goals of productivity, portability, and performance are domain specific languages (DSLs) and the auto-tuning methodology. The Patus stencil specification DSL allows the programmer to express a stencil computation in a concise way independently of hardware architecture-specific details. Thus, it increases the programmer productivity by disburdening her or him of low level programming model issues and of manually applying hardware platform-specific code optimization techniques. The use of domain specific languages also implies code reusability: once implemented, the same stencil specification can be reused on different hardware platforms, i.e., the specification code is portable across hardware architectures. Constructing the language to be geared towards a special purpose makes it amenable to more aggressive optimizations and therefore to potentially higher performance. Auto-tuning provides performance and performance portability by automated adaptation of implementation-specific parameters to the characteristics of the hardware on which the code will run. By automating the process of parameter tuning — which essentially amounts to solving an integer programming problem in which the objective function is the number representing the code's performance as a function of the parameter configuration, — the system can also be used more productively than if the programmer had to fine-tune the code manually. We show performance results for a variety of stencils, for which Patus was used to generate the corresponding implementations. The selection includes stencils taken from two real-world applications: a simulation of the temperature within the human body during hyperthermia cancer treatment and a seismic application. These examples demonstrate the framework's flexibility and ability to produce high performance code

    Surrogat-Basierte Optimierung für Marine Ökosystem-Modelle

    Get PDF
    Marine ecosystem models are of great importance for understanding the oceanic uptake of carbon dioxide and for projections of the marine ecosystem’s responses to climate change. The applicability of a marine ecosystem model for prognostic simulations crucially depends on its ability to resemble the actually observed physical and biogeochemical processes. An assessment of the quality of a given model is typically based on its calibration against observed quantities. This calibration or optimization process is intrinsically linked to an adjustment of typically poorly known model parameters. Straightforward calibration attempts by direct adjustment of the model parameters using conventional optimization algorithms are often tedious or even beyond the capabilities of modern computer power as they normally require a large number of simulations. This typically results in prohibitively high computational cost, particularly if already a single model evaluation involves time-consuming computer simulations. The optimization of coupled hydrodynamical marine ecosystem models simulating biogeochemical processes in the ocean is here a representative example. Computing times of hours up to several days already for a single model evaluation are not uncommon. A computationally efficient optimization of expensive simulation models can be realized using for example surrogate-based optimization. Therein, the optimization of the expensive, so-called high-fidelity (or fine) model is carried out by means of a surrogate – a fine model’s fast but yet reasonably accurate representation. This work comprises an investigation and application of surrogate-based optimization methodologies employing physics-based low-fidelity (or coarse) models. Seeking a computationally efficient calibration of marine ecosystem models serves as the fundamental aim. As a case study, two illustrative marine ecosystem models are considered. Here, coarse models obtained by a coarser temporal resolution and by a truncated model spin-up are investigated. The accuracy of these computationally cheaper coarse models is typically not sufficient to directly exploit them in the optimization loop in lieu of the fine model. I investigate suitable correction techniques to ensure that the corrected coarse model (the surrogate) provides a reliable prediction of the fine model optimum. Firstly, I focus on Aggressive Space Mapping as one of the original Space Mapping approaches. It will be shown that this optimization method allows to achieve a reasonable reduction in the optimization costs, provided that the considered coarse and fine model are sufficiently “similar”. A multiplicative response correction approach, subsequently investigated, turned out to be very suitable for the considered marine ecosystem models. A reliable surrogate can be obtained. Exploiting the latter in a surrogate-based optimization algorithm, a computationally cheap but yet accurate solution is achieved. The optimization costs can be significantly reduced compared to what is achieved by the Aggressive Space Mapping algorithm. The proposed methodologies, particularly the multiplicative response correction approach, serve as initial parts of a set of tools for a computationally efficient calibration of marine ecosystem models. The investigation of further enhancements of the presented algorithms as well as other promising approaches in the framework of surrogate-based optimization will be highly valuable.Marine Ökosystem-Modelle sind von großer Bedeutung, um die ozeanische Aufnahme von Kohlendioxid zu verstehen sowie Vorhersagen über die Reaktionen des marinen Ökosystems auf den Klimawandel treffen zu können. Die Anwendbarkeit eines marinen Ökosystem-Modells für prognostische Simulationen hängt entscheidend von seiner Fähigkeit ab, die tatsächlich beobachteten physikalischen und biogeochemischen Prozesse wiederzugeben. Um die Qualität von verschiedenen Modellen zu validieren, werden diese typischerweise an vorhandene Beobachtungsdaten angeglichen. Diese Validierung (oder Parameter- Identifikation) erfordert die Anpassungen von in der Regel wenig bekannten Modellparametern. Die direkte Kalibrierung des Modells mit Hilfe konventioneller Optimierungsalgorithmen ist üblicherweise ein langwieriger Prozess, der gegebenenfalls sogar jenseits verfügbarer Rechenressourcen liegt. Ein Grund dafür ist die meist große Zahl erforderlicher Modellsimulationen. Dies führt insbesondere dann zu einem erheblichen Rechenaufwand, wenn bereits eine einzelne Modellauswertung teure Computersimulationen notwendig macht. Ein Beispiel hierfür ist die Kalibrierung gekoppelter mariner Ökosystem-Modelle. Rechenzeiten von Stunden bis hin zu mehreren Tagen für eine einzelne Modellauswertung sind nicht unüblich. Eine effiziente Optimierung von teuren Computermodellen lässt sich beispielsweise mit Hilfe von surrogat-basierten Optimierungsverfahren realisieren. Ein Surrogat – eine schnelle aber dennoch ausreichend genaue Approximation des sogenannten feinen Modells – ermöglicht hierbei dessen Optimierung. Diese Arbeit umfasst die Untersuchung und Anwendung von Verfahren im Rahmen surrogat-basierter Optimierungsalgorithmen, bei denen die Surrogate auf sogenannten physikalischen groben Modellen beruhen. Übergreifendes Ziel ist eine effiziente und schnelle Kalibrierung von marinen Ökosystem-Modellen. Es werden zwei illustrative Modelle betrachtet. Die dazugehörigen groben Modelle werden beispielhaft durch grobe zeitliche Diskretisierung sowie durch einen verkürzten Modell-Spin-Up gewonnen. In der Regel sind solche groben Modelle nicht genau genug, um sie in der Optimierung direkt als Ersatz der feinen Modelle zu verwenden. Mit Hilfe geeigneter Techniken zur Korrektur der groben Modelle konstruiere ich daher ausreichend genaue Surrogate. Zuerst nutze ich hierfür Aggressive Space Mapping, einen der ursprünglichen Space Mapping-Algorithmen. Es wird gezeigt, dass dieses Optimierungsverfahren eine hinreichende Reduktion der Optimierungskosten erzielen kann, vorausgesetzt, das grobe und feine Modell stimmen ausreichend überein. Anschließend betrachte ich eine multiplikative Korrektur. Wie gezeigt wird, ist dieser Ansatz für die betrachteten Modelle gut geeignet. Zusätzlich ist die Optimierung der damit konstruierten Surrogate kostengünstig, erzielt aber dennoch eine ausreichend präzise Lösung. Die Optimierungskosten lassen sich hierbei deutlich gegenüber dem Aggressive Space Mapping-Algorithmus senken. Die vorgestellten Verfahren, insbesondere die multiplikative Korrektur, stellen erste Teile einer Sammlung von Tools für eine effiziente Kalibrierung mariner Ökosystem-Modelle dar. Die Untersuchung weiterer Verbesserungen der betrachteten Methoden sowie anderer möglicher Ansätze im Rahmen surrogat-basierter Optimierung ist vielversprechend

    Simulation software for supercomputers

    No full text
    corecore