312 research outputs found

    GPU Usage for Parallel Functions and Contacts in Modelica

    Get PDF
    This thesis investigates two ways of incorporating GPUs in Modelica. The first by automatically generating GPU code for Modelica functions, and the second by using GPU accelerated external code for a contact handling package. Automatic parallelization of functions is desired, as it can potentially accelerate large simulations significantly. Special patterns of nested for-loops in Modelica code are recognized and translated into CUDA kernel functions. Inline integration allows a broader spectrum of models to take advantage of the parallelization, by reducing CPU-GPU transfers. The prototype has been tested and achieved a speed-up factor of up to five compared to the CPU. The contact handling package is capable of handling both complex contact behavior between arbitrarily shaped bodies and large DEM-like simulations, something which Modelica is currently lacking. Attempts to accelerate the package with GPUs were made, with partial success for the broad phase. The package uses Morton encoding for the broad phase, and the narrow phase is based on CSG intersection with BSP trees. Contact response is calculated using a volume dependent method, taking friction, damping and multiple contact points into account. The capability of the package was demonstrated by the fact that both complex contact behavior such as the inversion of the Tippe Top toy and tens of thousands of colliding spheres could be simulated.One of the key components in our modern society is the ability to simulate. By simulations, the industry can design new cars, phones, aircrafts etc., without having to go through prototype after prototype, allowing us the cheap and high-tech products most of us rely on in our day-to-day life. However, good as simulations are today there are still severe limitations on what can be simulated. Two of the largest limiting factors are how hard simulations are to design and how long they take to run. The first of these problems are tackled by the Modelica programming language, which is designed for easy set-up of simulations. We have attacked both these problems by using GPUs to speed-up Modelica simulations more than 5 times, and extending the capability of Modelica to handle collisions between objects

    Efficient algorithms for the realistic simulation of fluids

    Get PDF
    Nowadays there is great demand for realistic simulations in the computer graphics field. Physically-based animations are commonly used, and one of the more complex problems in this field is fluid simulation, more so if real-time applications are the goal. Videogames, in particular, resort to different techniques that, in order to represent fluids, just simulate the consequence and not the cause, using procedural or parametric methods and often discriminating the physical solution. This need motivates the present thesis, the interactive simulation of free-surface flows, usually liquids, which are the feature of interest in most common applications. Due to the complexity of fluid simulation, in order to achieve real-time framerates, we have resorted to use the high parallelism provided by actual consumer-level GPUs. The simulation algorithm, the Lattice Boltzmann Method, has been chosen accordingly due to its efficiency and the direct mapping to the hardware architecture because of its local operations. We have created two free-surface simulations in the GPU: one fully in 3D and another restricted only to the upper surface of a big bulk of fluid, limiting the simulation domain to 2D. We have extended the latter to track dry regions and is also coupled with obstacles in a geometry-independent fashion. As it is restricted to 2D, the simulation loses some features due to the impossibility of simulating vertical separation of the fluid. To account for this we have coupled the surface simulation to a generic particle system with breaking wave conditions; the simulations are totally independent and only the coupling binds the LBM with the chosen particle system. Furthermore, the visualization of both systems is also done in a realistic way within the interactive framerates; raycasting techniques are used to provide the expected light-related effects as refractions, reflections and caustics. Other techniques that improve the overall detail are also applied as low-level detail ripples and surface foam

    High Performance Free Surface LBM on GPUs

    Get PDF

    Modelling the 3D Climate of Venus with OASIS

    Full text link
    Flexible 3D models to explore the vast diversity of terrestrial planets and interpret observational data are still in their early stages. In this work, we present OASIS: a novel and flexible 3D virtual planet laboratory. With OASIS we envision a platform that couples self-consistently seven individual modules representing the main physical and chemical processes that shape planetary environments. Additionally, OASIS is capable of producing simulated spectra from different instruments and observational techniques. In this work we focus on the benchmark test of coupling four of the physical modules: fluid dynamics, radiation, turbulence and surface/soil. To test the OASIS platform, we produced 3D simulations of the Venus climate and its atmospheric circulation and study how the modeled atmosphere changes with various cloud covers, atmospheric heat capacity, and surface friction. 3D simulations of Venus are challenging because they require long integration times with a computationally expensive radiative transfer code. By comparing OASIS results with observational data, we verify that the new model is able to successfully simulate Venus. With simulated spectra produced directly from the 3D simulations, we explore the capabilities of future missions, like LUVOIR, to observe Venus analogs located at a distance of 10 pc. With OASIS, we have taken the first steps to build a sophisticated and very flexible platform capable of studying the environment of terrestrial planets, which will be an essential tool to characterize observed terrestrial planets and plan future observations.Comment: MNRAS published versio

    Dynamics of Laterally Propagating Flames in X-ray Bursts. I. Burning Front Structure

    Get PDF
    We investigate the structure of laterally-propagating flames through the highly-stratified burning layer in an X-ray burst. Two-dimensional hydrodynamics simulations of flame propagation are performed through a rotating plane-parallel atmosphere, exploring the structure of the flame. We discuss the approximations needed to capture the length and time scales at play in an X-ray burst and describe the flame acceleration observed. Our studies complement other multidimensional studies of burning in X-ray bursts.Comment: Submitted to Ap

    High Performance Scientific Computing in Applications with Direct Finite Element Simulation

    Get PDF
    To predict separated flow including stall of a full aircraft with Computational Fluid Dynamics (CFD) is considered one of the problems of the grand challenges to be solved by 2030, according to NASA [1]. The nonlinear Navier- Stokes equations provide the mathematical formulation for fluid flow in 3- dimensional spaces. However, classical solutions, existence, and uniqueness are still missing. Since brute-force computation is intractable, to perform predictive simulation for a full aircraft, one can use Direct Numerical Simulation (DNS); however, it is prohibitively expensive as it needs to resolve the turbulent scales of order Re4 . Considering other methods such as statistical average Reynolds’s Average Navier Stokes (RANS), spatial average Large Eddy Simulation (LES), and hybrid Detached Eddy Simulation (DES), which require less number of degrees of freedom. All of these methods have to be tuned to benchmark problems, and moreover, near the walls, the mesh has to be very fine to resolve boundary layers (which means the computational cost is very expensive). Above all, the results are sensitive to, e.g. explicit parameters in the method, the mesh, etc. As a resolution to the challenge, here we present the adaptive time- resolved Direct FEM Solution (DFS) methodology with numerical tripping, as a predictive, parameter-free family of methods for turbulent flow. We solved the JAXA Standard Model (JSM) aircraft model at realistic Reynolds number, presented as part of the High Lift Prediction Workshop 3. We predicted lift Cl within 5% error vs. experiment, drag Cd within 10% error and stall 1◦ within the angle of attack. The workshop identified a likely experimental error of order 10% for the drag results. The simulation is 10 times faster and cheaper when compared to traditional or existing CFD approaches. The efficiency mainly comes from the slip boundary condition that allows coarse meshes near walls, goal-oriented adaptive error control that refines the mesh only where needed and large time steps using a Schur-type fixed-point iteration method, without compromising the accuracy of the simulation results. As a follow-up, we were invited to the Fifth High Order CFD Workshop, where the approach was validated for a tandem sphere problem (low Reynolds number turbulent flow) wherein a second sphere is placed a certain distance downstream from a first sphere. The results capture the expected slipstream phenomenon, with appx. 2% error. A comparison with the higher-order frameworks Nek500 and PyFR was done. The PyFR framework has demonstrated high effectiveness for GPUs with an unstructured mesh, which is a hard problem in this field. This is achieved by an explicit time-stepping approach. Our study showed that our large time step approach enabled appx. 3 orders of magnitude larger time steps than the explicit time steps in PyFR, which made our method more effective for solving the whole problem. We also presented a generalization of DFS to variable density and validated against the well-established MARIN benchmark problem. The results show good agreement with experimental results in the form of pressure sensors. Later, we used this methodology to solve two applications in multiphase flow problems. One has to do with a flash rainwater storage tank (Bilbao water consortium), and the second is about designing a nozzle for 3D printing. In the flash rainwater storage tank, we predicted that the water height in the tank has a significant influence on how the flow behaves downstream of the tank door (valve). For the 3D printing, we developed an efficient design with the focused jet flow to prevent oxidation and heating at the tip of the nozzle during a melting process. Finally, we presented here the parallelism on multiple GPUs and the embedded system Kalray architecture. Almost all supercomputers today have heterogeneous architectures, such as CPU+GPU or other accelerators, and it is, therefore, essential to develop computational frameworks to take advantage of them. For multiple GPUs, we developed a stencil computation, applied to geological folds simulation. We explored halo computation and used CUDA streams to optimize computation and communication time. The resulting performance gain was 23% for four GPUs with Fermi architecture, and the corresponding improvement obtained on four Kepler GPUs were 47%. The Kalray architecture is designed to have low energy consumption. Here we tested the Jacobi method with different communication strategies. Additionally, visualization is a crucial area when we do scientific simulations. We developed an automated visualization framework, where we could see that task parallelization is more than 10 times faster than data parallelization. We have also used our DFS in the cloud computing setting to validate the simulation against the local cluster simulation. Finally, we recommend the easy pre-processing tool to support DFS simulation.La Caixa 201

    High performance scientific computing in applications with direct finite element simulation

    Get PDF
    xiii, 133 p.La predicción del flujo separado, incluida la pérdida de un avión completo mediantela dinámica de fluidos computacional (CFD) se considera uno de los grandes desaf¿¿os que seresolverán en 2030, según NASA. Las ecuaciones no lineales de Navier-Stokes proporcionan laformulación matemática para flujo de fluidos en espacios tridimensionales. Sin embargo, todaviafaltan soluciones clásicas, existencia y singularidad. Ya que el cálculo de la fuerza bruta esintratable para realizar simulación predictiva para un avión completo, uno puede usar la simulaciónnumérica directa (DNS); sin embargo, prohibitivamente caro ya que necesita resolver laturbulencia a escala de magnitud Re power (9/4). Considerando otros métodos como el estad¿¿sticopromedio Reynolds¿s Average Navier Stokes (RANS), spatial average Large Eddy Simulation(LES), y Hybrid Detached Eddy Simulation (DES), que requieren menos cantidad de grados delibertad. Todos estos métodos deben ajustarse a los problemas de referencia y, además, cerca las paredes, la malla tieneque ser muy fina para resolver las capas l¿¿mite (lo cual significa que el costo computacional es muycostoso). Por encima de todo, los resultados son sensibles a, por ejemplo, parámetros expl¿¿citos enel método, la malla, etc.Como una solución al desaf¿¿o, aqu¿¿ presentamos la adaptación Metodolog¿¿a de solución directa deFEM (DFS) con resolución numérica disparo, como una familia predictiva, libre de parámetros demétodos para flujo turbulento. Resolvimos el modelo de avión JAXA Standard Model (JSM) ennúmero realista de Reynolds, presentado como parte del High Lift Taller de predicción 3.Predijimos un aumento de Cl dentro de un error de 5 % vs experimento, arrastre Cd dentro de 10 %error y detenga 1 ¿ dentro del ángulo de ataque.El taller identificó un probable experimento error depedido 10 % para los resultados de arrastre. La simulación es 10 veces más rápido y más barato encomparación con CFD tradicional o existente enfoques. La eficiencia proviene principalmente dell¿¿mite de deslizamiento condición que permite mallas gruesas cerca de las paredes, orientada aobjetivos control de error adaptativo que refina la malla solo donde es necesario y grandes pasos detiempo utilizando un método de iteración de punto fijo tipo Schur, sin comprometer la precisión delos resultados de la simulación.También presentamos una generalización de DFS a densidad variable y validado contra el problemade referencia MARIN bien establecido. los Los resultados muestran un buen acuerdo con losresultados experimentales en forma de sensores de presión. Más tarde, usamos esta metodolog¿¿apara resolver dos aplicaciones en problemas de flujo multifásico. Uno tiene que ver con un flashtanque de almacenamiento de agua de lluvia (consorcio de agua de Bilbao), y el segundo es sobre eldiseño de una boquilla para impresión 3D. En el agua de lluvia tanque de almacenamiento,predijimos que la altura del agua en el tanque tiene un influencia significativa sobre cómo secomporta el flujo aguas abajo de la puerta del tanque (válvula). Para la impresión 3D,desarrollamos un diseño eficiente con El flujo de chorro enfocado para evitar la oxidación y elcalentamiento en la punta del boquilla durante un proceso de fusión.Finalmente, presentamos aqu¿¿ el paralelismo en múltiples GPU y el incrustado sistema dearquitectura Kalray. Casi todas las supercomputadoras de hoy tienen arquitecturas heterogéneas,1 See the UNESCO Internacional Standard nomenclature for fields of Science and Technologyacomo CPU+GPU u otros aceleradores, y, por lo tanto, es esencial desarrollar marcoscomputacionales para aprovecha de ellos. Como lo hemos visto antes, se comienza a desarrollar eseCFD más tarde en la década de 1060 cuando podemos tener poder computacional, por lo tanto, Esesencial utilizar y probar estos aceleradores para los cálculos de CFD. Las GPU tienen unaarquitectura diferente en comparación con las CPU tradicionales. Técnicamente, la GPU tienemuchos núcleos en comparación con las CPU que hacen de la GPU una buena opción para elcómputo paralelo.Para múltiples GPU, desarrollamos un cálculo de plantilla, aplicado a simulación depliegues geológicos. Exploramos la computación de halo y utilizamos Secuencias CUDA paraoptimizar el tiempo de computación y comunicación. La ganancia de rendimiento resultante fue de23 % para cuatro GPU con arquitectura Fermi, y la mejora correspondiente obtenida en cuatro LasGPU Kepler fueron de 47 %.This research was carried out at the Basque Center for Applied Mathematics (BCAM) within the CFD Computational Technology (CFDCT) and also at the School of Electrical Engineering and Computer Science(Royal Institue of Technology, Stockholm, Sweden). Which is suported by Fundacion Obra Social “la Caixa“, Severo Ochoa Excellence research centre 2014-2018 SEV-2013-0323, Severo Ochoa Excellence research centre 2018-2022 SEV-2017-0718, BERC program 2014-2017, BERC program 2018-2021, MSO4SC European project, Elkartek. This work has been performed using the computing infrastructure from SNIC (Swedish National Infrastructure for Computing)
    • …
    corecore