606 research outputs found

    Developing Efficient Discrete Simulations on Multicore and GPU Architectures

    Get PDF
    In this paper we show how to efficiently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientific computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de Investigación (AEI) of Spain, cofinanced by FEDER funds (EU) TIN2017-89842

    Wall Orientation and Shear Stress in the Lattice Boltzmann Model

    Full text link
    The wall shear stress is a quantity of profound importance for clinical diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable numerical method of solving the flow problems, but it suffers from errors of the velocity field near the boundaries which leads to errors in the wall shear stress and normal vectors computed from the velocity. In this work we present a simple formula to calculate the wall shear stress in the lattice Boltzmann model and propose to compute wall normals, which are necessary to compute the wall shear stress, by taking the weighted mean over boundary facets lying in a vicinity of a wall element. We carry out several tests and observe an increase of accuracy of computed normal vectors over other methods in two and three dimensions. Using the scheme we compute the wall shear stress in an inclined and bent channel fluid flow and show a minor influence of the normal on the numerical error, implying that that the main error arises due to a corrupted velocity field near the staircase boundary. Finally, we calculate the wall shear stress in the human abdominal aorta in steady conditions using our method and compare the results with a standard finite volume solver and experimental data available in the literature. Applications of our ideas in a simplified protocol for data preprocessing in medical applications are discussed.Comment: 9 pages, 11 figure

    Development of a collision table for three dimensional lattice gases

    Get PDF
    Bibliography: pages 92-95.A lattice gas is a species of cellular automaton used for numerically simulating fluid flows. TransGas [9], the lattice gas code currently in use at the CSIR, is based on the FHP-I model [5], and is used to perform various two-dimensional flow simulations. In order to broaden the scope of the applications in which lattice gases can be used locally, the development of a three-dimensional lattice gas capability is required. The first major task in setting up a three dimensional-lattice gas is the construction of an efficient collision rule generator which will determine collision outcomes. For suitability to local applications, the collision rules should be chosen in such a way as to maximise the Reynolds coefficient of the flow, while conserving quantities such as mass and momentum. Part of the task thus becomes an optimisation problem. When expanding from two to three dimensions, the number of possible collision rules increases from 64 to 16777216. If a complete collision rule table is used for determining collision outcomes, storage problems are encountered on the available hardware. Selection and optimisation of collision rules cannot be done by hand when there are so many rules to choose from. Selection of rules is thus non-trivial. The work outlined in this thesis provides the CSIR with a 3-D lattice gas collision table which is well suited to the available hardware capabilities. The necessary theoretical background is considered, and a survey of the literature is presented. Based on the findings of this literature study, various methods of collision outcome determination are implemented which are considered to be suitable to the local needs, while remaining within the constraints set by hardware availability. An isometric collision algorithm, and a reduced collision table are generated and tested. A measure of the overall efficiency of a lattice gas model is determined by two factors, namely the computational efficiency and the implementation efficiency. In testing a collision table, the first is characterised by the rate at which post-collision states can be determined, and depends on the hardware and programming techniques. The second factor can be expressed by means of a number called the Reynolds coefficient, which is defined and discussed in the following chapters. The higher the Reynolds coefficient of a model, the greater the scope of flow regimes which may be simulated using it. Another advantage of having a high Reynolds coefficient is that the simulation time required for a given flow regime decreases as the Reynolds coefficient of the model increases. The overall efficiency of the isometric model is too low to be of practical use, but a significant improvement is obtained by using the method of reduced tables. In the isometric case, the number of collision outcomes that can be determined per second is similar to that of the reduced table, but the Reynolds coefficient is very much lower. Simulation of a flow regime with a Reynolds number of about 100, on a lattice of size 128³, over 20 thousand timesteps, making use of the isometric model, would take of the order of a few years to complete on the currently available hardware. Since the simulation parameters mentioned above are typical of the local requirements for lattice gas simulations, this method is obviously unsatisfactory. The isometric method does however serve as a useful introduction to three-dimensional lattice gas collision rule methods. The reduced collision table has been constructed so that it maintains semi-detailed balance, and the Boltzmann Reynolds coefficient has been calculated. In the reduced collision table model, the efficiency is higher than the isometric case in respect of both the rate at which collision outcomes can be determined, and in terms of the Reynolds coefficient. As a result of these improvements, the simulation time for the exact case mentioned above would reduce to the order of days, on the same hardware. This simulation time is sufficiently low for immediate practical application in the local environment

    Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations

    Get PDF
    AbstractSimulation of in vivo cellular processes with the reaction–diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel efficiency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli. Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems

    A new approach to the lattice Boltzmann method for graphics processing units

    Get PDF
    International audienceEmerging many-core processors, like CUDA capable nVidia GPUs, are promising platforms for regular parallel algorithms such as the Lattice Boltzmann Method (LBM). Since the global memory for graphic devices shows high latency and LBM is data intensive, the memory access pattern is an important issue for achieving good performances. Whenever possible, global memory loads and stores should be coalescent and aligned, but the propagation phase in LBM can lead to frequent misaligned memory accesses. Most previous CUDA implementations of 3D LBM addressed this problem by using low latency on chip shared memory. Instead of this, our CUDA implementation of LBM follows carefully chosen data transfer schemes in global memory. For the 3D lid-driven cavity test case, we obtained up to 86% of the global memory maximal throughput on nVidia's GT200. We show that as a consequence highly efficient implementations of LBM on GPUs are possible, even for complex models

    A Language-centered Approach to support environmental modeling with Cellular Automata

    Get PDF
    Die Anwendung von Methodiken und Technologien aus dem Bereich der Softwaretechnik auf den Bereich der Umweltmodellierung ist eine gemeinhin akzeptierte Vorgehensweise. Im Rahmen der "modellgetriebenen Entwicklung"(MDE, model-driven engineering) werden Technologien entwickelt, die darauf abzielen, Softwaresysteme vorwiegend auf Basis von im Vergleich zu Programmquelltexten relativ abstrakten Modellen zu entwickeln. Ein wesentlicher Bestandteil von MDE sind Techniken zur effizienten Entwicklung von "domänenspezifischen Sprachen"( DSL, domain-specific language), die auf Sprachmetamodellen beruhen. Die vorliegende Arbeit zeigt, wie modellgetriebene Entwicklung, und insbesondere die metamodellbasierte Beschreibung von DSLs, darüber hinaus Aspekte der Pragmatik unterstützen kann, deren Relevanz im erkenntnistheoretischen und kognitiven Hintergrund wissenschaftlichen Forschens begründet wird. Hierzu wird vor dem Hintergrund der Erkenntnisse des "modellbasierten Forschens"(model-based science und model-based reasoning) gezeigt, wie insbesondere durch Metamodelle beschriebene DSLs Möglichkeiten bieten, entsprechende pragmatische Aspekte besonders zu berücksichtigen, indem sie als Werkzeug zur Erkenntnisgewinnung aufgefasst werden. Dies ist v.a. im Kontext großer Unsicherheiten, wie sie für weite Teile der Umweltmodellierung charakterisierend sind, von grundsätzlicher Bedeutung. Die Formulierung eines sprachzentrierten Ansatzes (LCA, language-centered approach) für die Werkzeugunterstützung konkretisiert die genannten Aspekte und bildet die Basis für eine beispielhafte Implementierung eines Werkzeuges mit einer DSL für die Beschreibung von Zellulären Automaten (ZA) für die Umweltmodellierung. Anwendungsfälle belegen die Verwendbarkeit von ECAL und der entsprechenden metamodellbasierten Werkzeugimplementierung.The application of methods and technologies of software engineering to environmental modeling and simulation (EMS) is common, since both areas share basic issues of software development and digital simulation. Recent developments within the context of "Model-driven Engineering" (MDE) aim at supporting the development of software systems at the base of relatively abstract models as opposed to programming language code. A basic ingredient of MDE is the development of methods that allow the efficient development of "domain-specific languages" (DSL), in particular at the base of language metamodels. This thesis shows how MDE and language metamodeling in particular, may support pragmatic aspects that reflect epistemic and cognitive aspects of scientific investigations. For this, DSLs and language metamodeling in particular are set into the context of "model-based science" and "model-based reasoning". It is shown that the specific properties of metamodel-based DSLs may be used to support those properties, in particular transparency, which are of particular relevance against the background of uncertainty, that is a characterizing property of EMS. The findings are the base for the formulation of an corresponding specific metamodel- based approach for the provision of modeling tools for EMS (Language-centered Approach, LCA), which has been implemented (modeling tool ECA-EMS), including a new DSL for CA modeling for EMS (ECAL). At the base of this implementation, the applicability of this approach is shown

    Ein verteilter und agentenbasierter Ansatz für gekoppelte Probleme der rechnergestützten Ingenieurwissenschaften

    Get PDF
    Challenging questions in science and engineering often require to decouple a complex problem and to focus on isolated sub-problems first. The knowledge of those individual solutions can later be combined to obtain the result for the full question. A similar technique is applied in numerical modeling. Here, the software solver for subsets of the coupled problem might already exist and can directly be used. This thesis describes a software environment capable of combining multiple software solvers, the result being a new, combined model. Two important design decisions were crucial at the beginning: First, every sub-model keeps full control of its execution. Second, the source code of the sub-model requires only minimal adaptation. The sub-models choose themselves when to issue communication calls, with no outer synchronisation mechanism required. The coupling of heterogeneous hardware is supported as well as the use of homogeneous compute clusters. Furthermore, the coupling framework allows sub-solvers to be written in different programming languages. Also, each of the sub-models may operate on its own spatial and temporal scales. The next challenge was to allow the potential coupling of thousands software agents, being able to utilise today's petascale hardware. For this purpose, a specific coupling framework was designed and implemented, combining the experiences from the previous work with additions required to cope with the targeted number of coupled sub-models. The large number of interacting models required a much more dynamic approach, where the agents automatically detect their communication partners at runtime. This eliminates the need to explicitly specify the coupling graph a~priori. Agents are allowed to enter (and leave) the simulation at any time, with the coupling graph changing accordingly.Da viele Problemstellungen im Ingenieurwesen sehr komplex sind, ist es oft sinnvoll, sie in einzelne Teilprobleme aufzugliedern. Diese Teilbereiche können nun einzeln angegangen und dann zur Gesamtlösung kombiniert werden. Ein ähnlicher Ansatz wird bei der numerischen Modellierung verfolgt: Komplexe Software wird schrittweise erstellt, indem Software-Löser für einzelne Bereiche zuerst separat erarbeitet werden. In dieser Arbeit wird eine Software beschrieben, die eine Vielzahl von unabhängigen Software-Lösern kombinieren kann. Jedes Teilmodell verhält sich weiterhin wie ein selbständiges Programm. Hierfür wird es in einen Software-Agenten gehüllt. Zur Kopplung sind lediglich minimale Ergänzungen am Quellcode des Teilmodells nötig. Möglich wird dies durch die Struktur der Kommunikation zwischen den Teilmodellen. Sie lässt den Modellen die Kontrolle über die Kommunikationsaufrufe und benötigt zur Synchronisation keine Einflussnahme einer übergeordneten Instanz. Manche Teilmodelle sind für den Gebrauch mit einer speziellen Hardware optimiert. Daher musste das Zusammenspiel unterschiedlicher Hardware ebenso berücksichtigt werden wie homogene Rechencluster. Weiterhin ermöglicht das Kopplungs-Framework, dass unterschiedliche Programmiersprachen verbunden werden können. Wie schon der Programmablauf, so können auch die Modellparameter, etwa die räumliche und zeitliche Skala, von Teilmodell zu Teilmodell unterschiedlich bleiben. Weiter behandelt diese Arbeit eine Vorgehensweise um tausende von Software-Agenten zu einem Groß-Modell zu koppeln. Dies ist erforderlich, wenn die Ressourcen heutiger Petascale Rechencluster benutzt werden sollen. Hierzu wurde das bisherige Framework neu aufgelegt, da die große Anzahl von zu koppelnden Modellen einer wesentlich dynamischeren Kommunikationsstruktur bedarf. Die Agenten der Teilmodelle können einer laufenden Simulation hinzugefügt werden (oder diese verlassen) und die globalen Kopplungsbeziehungen passen sich dementsprechend an
    corecore