1,681 research outputs found

    The design and verification of Mumax3

    Get PDF
    We report on the design, verification and performance of mumax3, an open-source GPU-accelerated micromagnetic simulation program. This software solves the time- and space dependent magnetization evolution in nano- to micro scale magnets using a finite-difference discretization. Its high performance and low memory requirements allow for large-scale simulations to be performed in limited time and on inexpensive hardware. We verified each part of the software by comparing results to analytical values where available and to micromagnetic standard problems. mumax3 also offers specific extensions like MFM image generation, moving simulation window, edge charge removal and material grains

    Beyond ExaBricks: GPU Volume Path Tracing of AMR Data

    Full text link
    Adaptive Mesh Refinement (AMR) is becoming a prevalent data representation for scientific visualization. Resulting from large fluid mechanics simulations, the data is usually cell centric, imposing a number of challenges for high quality reconstruction at sample positions. While recent work has concentrated on real-time volume and isosurface rendering on GPUs, the rendering methods used still focus on simple lighting models without scattering events and global illumination. As in other areas of rendering, key to real-time performance are acceleration data structures; in this work we analyze the major bottlenecks of data structures that were originally optimized for camera/primary ray traversal when used with the incoherent ray tracing workload of a volumetric path tracer, and propose strategies to overcome the challenges coming with this

    Hybrid CPU/GPU implementation for the FE2 multi-scale method for composite problems

    Get PDF
    This thesis aims to develop a High-Performance Computing implementation to solve large composite materials problems through the use of the FE2 multi-scale method. Previous works have not been able to scale the FE2 strategy to real size problems with mesh resolutions of more than 10K elements at the macro-scale and 100^3 elements at the micro-scale. The latter is due to the computational requirements needed to carry out these calculations. This works identifies the most computationally intensive parts of the FE2 algorithm and ports several parts of the micro-scale computations to GPUs. The cases considered assume small deformations and steady-state equilibrium conditions. The work provides a feasible parallel strategy that can be used in real engineering cases to optimize the design of composite material structures. For this, it presents a coupling scheme between the MPI multi-physics code Alya (macro-scale) and the CPU/GPU-accelerated code Micropp (micro-scale). The coupled system is designed to work on multi-GPU architectures and to exploit the GPU overloading. Also, a Multi-Zone coupling methodology combined with weighted partitioning is proposed to reduce the computational cost and to solve the load balance problem. The thesis demonstrates that the method proposed scales notably well for the target problems, especially in hybrid architectures with distributed CPU nodes and communicated with multiple GPUs. Moreover, it clarifies the advantages achieved with the CPU/GPU accelerated version respect to the pure CPU approach.Esta tesis apunta a desarrollar una implementación de alta performance computacional para resolver problemas grandes de materiales compuestos a través del método de Multi-Escala FE2. Trabajos previos no han logrado escalar la técnica FE2 a problemas de dimensiones reales con mayas de resolucion de más de 10 K elementos en la macro-escala y 100^3 elementos en la micro-escala. Esto último se debe a los requerimientos computacionales para llevar a cabo estos cálculos. Este trabajo identifica las partes computacionales más costosas del algoritmo FE2 y porta varias partes del cálculo de micro-escala a GPUs. Los casos considerados asumen condiciones de pequeñas deformaciones y estado estacionario de equilibrio. El trabajo provee una estrategía factible que puede ser usada en problemas reales de ingeniería para optimizar el diseño de estructuras de materiales compuestos. Para esto se presenta un esquema de acople entre el codigo MPI de multi-física Alya (macro-escala) y la versión acelerada CPU/GPU de Micropp (micro-escala). El sistema acoplado está diseñado para trabajar con arquitecturas de multiples GPUs y explotar la sobrecarga de GPUs. También, un método de multiple zonas de acople combinado con particionado pesado es propuesto para reducir el costo computacional y resolver el problema de balanceo de carga. La tesis demuestra que el método propuesto escala notablemente bien para los problemas modelo, especialmente en arquitecturas híbridas con nodos CPU distribuidos y comunicados con multiples GPUs. Más aún, la tesis clarifica las ventajas logradas con la versión acelerada CPU/GPU respecto a usar unicamente CPUs

    Massively Parallel Computation Using Graphics Processors with Application to Optimal Experimentation in Dynamic Control

    Get PDF
    The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability has lead to its adoption in many non-graphics applications, including wide variety of scientific computing fields. At the same time, a number of important dynamic optimal policy problems in economics are athirst of computing power to help overcome dual curses of complexity and dimensionality. We investigate if computational economics may benefit from new tools on a case study of imperfect information dynamic programming problem with learning and experimentation trade-off that is, a choice between controlling the policy target and learning system parameters. Specifically, we use a model of active learning and control of linear autoregression with unknown slope that appeared in a variety of macroeconomic policy and other contexts. The endogeneity of posterior beliefs makes the problem difficult in that the value function need not be convex and policy function need not be continuous. This complication makes the problem a suitable target for massively-parallel computation using graphics processors. Our findings are cautiously optimistic in that new tools let us easily achieve a factor of 15 performance gain relative to an implementation targeting single-core processors and thus establish a better reference point on the computational speed vs. coding complexity trade-off frontier. While further gains and wider applicability may lie behind steep learning barrier, we argue that the future of many computations belong to parallel algorithms anyway.Graphics Processing Units, CUDA programming, Dynamic programming, Learning, Experimentation

    Plasma propulsion simulation using particles

    Full text link
    This perspective paper deals with an overview of particle-in-cell / Monte Carlo collision models applied to different plasma-propulsion configurations and scenarios, from electrostatic (E x B and pulsed arc) devices to electromagnetic (RF inductive, helicon, electron cyclotron resonance) thrusters, with an emphasis on plasma plumes and their interaction with the satellite. The most important elements related to the modeling of plasma-wall interaction are also presented. Finally, the paper reports new progress in the particle-in-cell computational methodology, in particular regarding accelerating computational techniques for multi-dimensional simulations and plasma chemistry Monte Carlo modules for molecular and alternative propellan

    A GPU-Accelerated Shallow-Water Scheme for Surface Runoff Simulations

    Get PDF
    The capability of a GPU-parallelized numerical scheme to perform accurate and fast simulations of surface runo in watersheds, exploiting high-resolution digital elevation models (DEMs), was investigated. The numerical computations were carried out by using an explicit finite volume numerical scheme and adopting a recent type of grid called Block-Uniform Quadtree (BUQ), capable of exploiting the computational power of GPUs with negligible overhead. Moreover, stability and zero mass error were ensured, even in the presence of very shallow water depth, by introducing a proper reconstruction of conserved variables at cell interfaces, a specific formulation of the slope source term and an explicit discretization of the friction source term. The 2D shallow water model was tested against two dierent literature tests and a real event that recently occurred in Italy for which field data is available. The influence of the spatial resolution adopted in dierent portions of the domain was also investigated for the last test. The achieved low ratio of simulation to physical times, in some cases less than 1:20, opens new perspectives for flood management strategies. Based on the result of such models, emergency plans can be designed in order to achieve a significant reduction in the economic losses generated by flood events
    corecore