7 research outputs found

    High Performance Scientific Computing in Applications with Direct Finite Element Simulation

    Get PDF
    To predict separated flow including stall of a full aircraft with Computational Fluid Dynamics (CFD) is considered one of the problems of the grand challenges to be solved by 2030, according to NASA [1]. The nonlinear Navier- Stokes equations provide the mathematical formulation for fluid flow in 3- dimensional spaces. However, classical solutions, existence, and uniqueness are still missing. Since brute-force computation is intractable, to perform predictive simulation for a full aircraft, one can use Direct Numerical Simulation (DNS); however, it is prohibitively expensive as it needs to resolve the turbulent scales of order Re4 . Considering other methods such as statistical average Reynolds’s Average Navier Stokes (RANS), spatial average Large Eddy Simulation (LES), and hybrid Detached Eddy Simulation (DES), which require less number of degrees of freedom. All of these methods have to be tuned to benchmark problems, and moreover, near the walls, the mesh has to be very fine to resolve boundary layers (which means the computational cost is very expensive). Above all, the results are sensitive to, e.g. explicit parameters in the method, the mesh, etc. As a resolution to the challenge, here we present the adaptive time- resolved Direct FEM Solution (DFS) methodology with numerical tripping, as a predictive, parameter-free family of methods for turbulent flow. We solved the JAXA Standard Model (JSM) aircraft model at realistic Reynolds number, presented as part of the High Lift Prediction Workshop 3. We predicted lift Cl within 5% error vs. experiment, drag Cd within 10% error and stall 1◦ within the angle of attack. The workshop identified a likely experimental error of order 10% for the drag results. The simulation is 10 times faster and cheaper when compared to traditional or existing CFD approaches. The efficiency mainly comes from the slip boundary condition that allows coarse meshes near walls, goal-oriented adaptive error control that refines the mesh only where needed and large time steps using a Schur-type fixed-point iteration method, without compromising the accuracy of the simulation results. As a follow-up, we were invited to the Fifth High Order CFD Workshop, where the approach was validated for a tandem sphere problem (low Reynolds number turbulent flow) wherein a second sphere is placed a certain distance downstream from a first sphere. The results capture the expected slipstream phenomenon, with appx. 2% error. A comparison with the higher-order frameworks Nek500 and PyFR was done. The PyFR framework has demonstrated high effectiveness for GPUs with an unstructured mesh, which is a hard problem in this field. This is achieved by an explicit time-stepping approach. Our study showed that our large time step approach enabled appx. 3 orders of magnitude larger time steps than the explicit time steps in PyFR, which made our method more effective for solving the whole problem. We also presented a generalization of DFS to variable density and validated against the well-established MARIN benchmark problem. The results show good agreement with experimental results in the form of pressure sensors. Later, we used this methodology to solve two applications in multiphase flow problems. One has to do with a flash rainwater storage tank (Bilbao water consortium), and the second is about designing a nozzle for 3D printing. In the flash rainwater storage tank, we predicted that the water height in the tank has a significant influence on how the flow behaves downstream of the tank door (valve). For the 3D printing, we developed an efficient design with the focused jet flow to prevent oxidation and heating at the tip of the nozzle during a melting process. Finally, we presented here the parallelism on multiple GPUs and the embedded system Kalray architecture. Almost all supercomputers today have heterogeneous architectures, such as CPU+GPU or other accelerators, and it is, therefore, essential to develop computational frameworks to take advantage of them. For multiple GPUs, we developed a stencil computation, applied to geological folds simulation. We explored halo computation and used CUDA streams to optimize computation and communication time. The resulting performance gain was 23% for four GPUs with Fermi architecture, and the corresponding improvement obtained on four Kepler GPUs were 47%. The Kalray architecture is designed to have low energy consumption. Here we tested the Jacobi method with different communication strategies. Additionally, visualization is a crucial area when we do scientific simulations. We developed an automated visualization framework, where we could see that task parallelization is more than 10 times faster than data parallelization. We have also used our DFS in the cloud computing setting to validate the simulation against the local cluster simulation. Finally, we recommend the easy pre-processing tool to support DFS simulation.La Caixa 201

    High performance scientific computing in applications with direct finite element simulation

    Get PDF
    xiii, 133 p.La predicción del flujo separado, incluida la pérdida de un avión completo mediantela dinámica de fluidos computacional (CFD) se considera uno de los grandes desaf¿¿os que seresolverán en 2030, según NASA. Las ecuaciones no lineales de Navier-Stokes proporcionan laformulación matemática para flujo de fluidos en espacios tridimensionales. Sin embargo, todaviafaltan soluciones clásicas, existencia y singularidad. Ya que el cálculo de la fuerza bruta esintratable para realizar simulación predictiva para un avión completo, uno puede usar la simulaciónnumérica directa (DNS); sin embargo, prohibitivamente caro ya que necesita resolver laturbulencia a escala de magnitud Re power (9/4). Considerando otros métodos como el estad¿¿sticopromedio Reynolds¿s Average Navier Stokes (RANS), spatial average Large Eddy Simulation(LES), y Hybrid Detached Eddy Simulation (DES), que requieren menos cantidad de grados delibertad. Todos estos métodos deben ajustarse a los problemas de referencia y, además, cerca las paredes, la malla tieneque ser muy fina para resolver las capas l¿¿mite (lo cual significa que el costo computacional es muycostoso). Por encima de todo, los resultados son sensibles a, por ejemplo, parámetros expl¿¿citos enel método, la malla, etc.Como una solución al desaf¿¿o, aqu¿¿ presentamos la adaptación Metodolog¿¿a de solución directa deFEM (DFS) con resolución numérica disparo, como una familia predictiva, libre de parámetros demétodos para flujo turbulento. Resolvimos el modelo de avión JAXA Standard Model (JSM) ennúmero realista de Reynolds, presentado como parte del High Lift Taller de predicción 3.Predijimos un aumento de Cl dentro de un error de 5 % vs experimento, arrastre Cd dentro de 10 %error y detenga 1 ¿ dentro del ángulo de ataque.El taller identificó un probable experimento error depedido 10 % para los resultados de arrastre. La simulación es 10 veces más rápido y más barato encomparación con CFD tradicional o existente enfoques. La eficiencia proviene principalmente dell¿¿mite de deslizamiento condición que permite mallas gruesas cerca de las paredes, orientada aobjetivos control de error adaptativo que refina la malla solo donde es necesario y grandes pasos detiempo utilizando un método de iteración de punto fijo tipo Schur, sin comprometer la precisión delos resultados de la simulación.También presentamos una generalización de DFS a densidad variable y validado contra el problemade referencia MARIN bien establecido. los Los resultados muestran un buen acuerdo con losresultados experimentales en forma de sensores de presión. Más tarde, usamos esta metodolog¿¿apara resolver dos aplicaciones en problemas de flujo multifásico. Uno tiene que ver con un flashtanque de almacenamiento de agua de lluvia (consorcio de agua de Bilbao), y el segundo es sobre eldiseño de una boquilla para impresión 3D. En el agua de lluvia tanque de almacenamiento,predijimos que la altura del agua en el tanque tiene un influencia significativa sobre cómo secomporta el flujo aguas abajo de la puerta del tanque (válvula). Para la impresión 3D,desarrollamos un diseño eficiente con El flujo de chorro enfocado para evitar la oxidación y elcalentamiento en la punta del boquilla durante un proceso de fusión.Finalmente, presentamos aqu¿¿ el paralelismo en múltiples GPU y el incrustado sistema dearquitectura Kalray. Casi todas las supercomputadoras de hoy tienen arquitecturas heterogéneas,1 See the UNESCO Internacional Standard nomenclature for fields of Science and Technologyacomo CPU+GPU u otros aceleradores, y, por lo tanto, es esencial desarrollar marcoscomputacionales para aprovecha de ellos. Como lo hemos visto antes, se comienza a desarrollar eseCFD más tarde en la década de 1060 cuando podemos tener poder computacional, por lo tanto, Esesencial utilizar y probar estos aceleradores para los cálculos de CFD. Las GPU tienen unaarquitectura diferente en comparación con las CPU tradicionales. Técnicamente, la GPU tienemuchos núcleos en comparación con las CPU que hacen de la GPU una buena opción para elcómputo paralelo.Para múltiples GPU, desarrollamos un cálculo de plantilla, aplicado a simulación depliegues geológicos. Exploramos la computación de halo y utilizamos Secuencias CUDA paraoptimizar el tiempo de computación y comunicación. La ganancia de rendimiento resultante fue de23 % para cuatro GPU con arquitectura Fermi, y la mejora correspondiente obtenida en cuatro LasGPU Kepler fueron de 47 %.This research was carried out at the Basque Center for Applied Mathematics (BCAM) within the CFD Computational Technology (CFDCT) and also at the School of Electrical Engineering and Computer Science(Royal Institue of Technology, Stockholm, Sweden). Which is suported by Fundacion Obra Social “la Caixa“, Severo Ochoa Excellence research centre 2014-2018 SEV-2013-0323, Severo Ochoa Excellence research centre 2018-2022 SEV-2017-0718, BERC program 2014-2017, BERC program 2018-2021, MSO4SC European project, Elkartek. This work has been performed using the computing infrastructure from SNIC (Swedish National Infrastructure for Computing)

    Efficiently simulating Lagrangian particles in large-scale ocean flows — Data structures and their impact on geophysical applications

    Get PDF
    Studying oceanography by using Lagrangian simulations has been adopted for a range of scenarios, such as the determining the fate of microplastics in the ocean, simulating the origin locations of microplankton used for palaeoceanographic reconstructions, and for studying the impact of fish aggregation devices on the migration behaviour of tuna. These simulations are complex and represent a considerable runtime effort to obtain trajectory results, which is the prime motivation for enhancing the performance of Lagrangian particle simulators. This paper assesses established performance enhancing techniques from Eulerian simulators in light of computational conditions and demands of Lagrangian simulators. A performance enhancement strategy specifically targeting physics-based Lagrangian particle simulations is outlined to address the performance gaps, and techniques for closing the performance gap are presented and implemented. Realistic experiments are derived from three specific oceanographic application scenarios, and the suggested performance-enhancing techniques are benchmarked in detail, so to allow for a good attribution of speed-up measurements to individual techniques. The impacts and insights of the performance enhancement strategy are further discussed for Lagrangian simulations in other geoscience applications. The experiments show that I/O-enhancing techniques, such as dynamic loading and buffering, lead to considerable speed-up on-par with an idealised parallelisation of the process over 20 nodes. Conversely, while the cache-efficient structure-of-arrays collection yields a visible speed-up, other alternative data structures fail in fulfilling the theoretically-expected performance increase. This insight demonstrates the importance of good data alignment in memory and caches for Lagrangian physics simulations

    Interactive, multi-purpose traffic prediction platform using connected vehicles dataset

    Get PDF
    Traffic congestion is a perennial issue because of the increasing traffic demand yet limited budget for maintaining current transportation infrastructure; let alone expanding them. Many congestion management techniques require timely and accurate traffic estimation and prediction. Examples of such techniques include incident management, real-time routing, and providing accurate trip information based on historical data. In this dissertation, a speech-powered traffic prediction platform is proposed, which deploys a new deep learning algorithm for traffic prediction using Connected Vehicles (CV) data. To speed-up traffic forecasting, a Graph Convolution -- Gated Recurrent Unit (GC-GRU) architecture is proposed and analysis of its performance on tabular data is compared to state-of-the-art models. GC-GRU's Mean Absolute Percentage Error (MAPE) was very close to Transformer (3.16 vs 3.12) while achieving the fastest inference time and a six-fold faster training time than Transformer, although Long-Short-Term Memory (LSTM) was the fastest in training. Such improved performance in traffic prediction with a shorter inference time and competitive training time allows the proposed architecture to better cater to real-time applications. This is the first study to demonstrate the advantage of using multiscale approach by combining CV data with conventional sources such as Waze and probe data. CV data was better at detecting short duration, Jam and stand-still incidents and detected them earlier as compared to probe. CV data excelled at detecting minor incidents with a 90 percent detection rate versus 20 percent for probes and detecting them 3 minutes faster. To process the big CV data faster, a new algorithm is proposed to extract the spatial and temporal features from the CSV files into a Multiscale Data Analysis (MDA). The algorithm also leverages Graphics Processing Unit (GPU) using the Nvidia Rapids framework and Dask parallel cluster in Python. The results show a seventy-fold speedup in the data Extract, Transform, Load (ETL) of the CV data for the State of Missouri of an entire day for all the unique CV journeys (reducing the processing time from about 48 hours to 25 minutes). The processed data is then fed into a customized UNet model that learns highlevel traffic features from network-level images to predict large-scale, multi-route, speed and volume of CVs. The accuracy and robustness of the proposed model are evaluated by taking different road types, times of day and image snippets of the developed model and comparable benchmarks. To visually analyze the historical traffic data and the results of the prediction model, an interactive web application powered by speech queries is built to offer accurate and fast insights of traffic performance, and thus, allow for better positioning of traffic control strategies. The product of this dissertation can be seamlessly deployed by transportation authorities to understand and manage congestions in a timely manner.Includes bibliographical references

    The Art of Movies

    Get PDF
    Movie is considered to be an important art form; films entertain, educate, enlighten and inspire audiences. Film is a term that encompasses motion pictures as individual projects, as well as — in metonymy — the field in general. The origin of the name comes from the fact that photographic film (also called filmstock) has historically been the primary medium for recording and displaying motion pictures. Many other terms exist — motion pictures (or just pictures or “picture”), the silver screen, photoplays, the cinema, picture shows, flicks — and commonly movies

    Efficient large-scale real-world flood simulations using the shallow water equations on GPUs

    No full text
    Climate change is one of the largest challenges humanity has to cope with today. Earth-orbiting satellites and other technological advances have enabled scientists to see the big picture, collecting many different types of information about our planet and its climate on a global scale. Data collected over many years reveals the signals of a changing climate. Europe and the northern hemisphere are warming at faster pace than the global average. Europe's Atlantic-facing countries are predicted to suffer heavier rainfalls, greater flood risk, more severe storm damage, according to the most comprehensive study of Europe's vulnerability to climate change. National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA) confirmed that 2016 had broken the record for the hottest year ever previously held by 2015, which had itself broken the record that had been held by 2014. According to a climate change report from 2014, global sea level rose about 20 centimetres in the last century. The rate in the last two decades, however, is nearly double that of the last century. Moreover, in the last years a surprisingly large number of major floods happened around the world, which suggests that floods may have increased and will continue to increase in the near future. Modern science in combination with the latest simulation technologies can help to understand the cause and the impact of the adverse phenomena related to climate change. Moreover, we can exploit our knowledge and simulation tools to prepare response measures which aim at reducing the risk associated with flood events. Today, a lot of effort is put into making flood simulations faster and more accurate to increase both computational efficiency and fidelity of the results. The aim of this thesis is to provide an efficient and robust simulation tool for large-scale flood simulations that can be used to support decision making. This goal is addressed by developing a new scheme for the shallow water equations (SWE), implementing it efficiently for graphics processing units (GPUs) and validating it on analytic, laboratory and real-world cases in comparison with other schemes. Chapter I starts with a motivation for this thesis. This is followed by a general overview on fluid and flood simulations including the introduction of the SWE along with discretization methods for them. The next section gives a short insight into GPUs architectures and justifies the suitability of the SWE for parallel computations on these devices. In the last section of the chapter the main goals of this thesis are explained. In Chapter II, we propose a new two-dimensional numerical scheme named HWP, to solve the SWE. The HWP scheme is an enhanced version of a scheme by Kurganov and Petrova (KP), which aims to improve the solution in the presence of partially flooded cells. The presented scheme is well-balanced, positivity preserving, and handles dry states. Mass conservation is ensured by using the draining time step (DTS) technique in the time integration process, which guarantees non-negative water depths. Unlike the KP scheme, our technique does not generate high velocities at the dry/wet boundaries, which are responsible for small time step sizes and slow simulation runs. We prove that the new scheme preserves ¿lake at rest' steady states and guarantees the positivity of the computed fluid depth in the partially flooded cells. We compare the new scheme, along with the KP scheme, against the analytical solution for a parabolic basin and show the improved simulation performance of the new scheme for two real-world scenarios. Chapter III presents a new GPU implementation for the HWP and KP schemes on Cartesian grids. Previous implementations are not fast enough to evaluate multiple scenarios for a robust, uncertainty-aware decision support. To tackle this, we exploit the capabilities of the NVIDIA Kepler architecture and the new shuffle instructions. The KP scheme is simpler but suffers from incorrect high velocities along the wet/dry boundaries, resulting in small time steps and long simulation run-times. The HWP scheme resolves this problem but comprises a more complex algorithm, that represents an extra burden on the GPU. Here, an efficient and novel shuffle-based implementation is presented for both schemes. Moreover, a performance comparison is provided, in which we compare shuffle-based implementations with pure shared memory versions. The correctness and performance is validated on real-world scenarios. In Chapter IV an exhaustive comparison and validation is performed and presented, which contains important use cases essential for developers and practitioners working with flood simulation tools. We discuss three state-of-the-art shallow water schemes, one by Kurganov and Petrova (KP), its successor by Horváth et al. (HWP), and our two-dimensional extension of the scheme by Chen and Noelle (CN). We analyse the advantages and disadvantages of each scheme on an extensive list of scenarios including several analytical and laboratory cases as well as a representative set of three historical floods. To enable the real-world studies, we address the implementation of the required boundary conditions (BCs), such as wall BCs, discharge BCs and water level BCs. Chapter V contains a summary and the findings presented in this thesis, which advance the knowledge in simulating floods using the SWE on GPUs. The new HWP scheme tackles the non-physical velocities that appear along the dry/wet boundaries. This not only improves the numerical accuracy, but allows for faster simulation since there are no high velocity spots that act as a limiting factor on the time step sizes. Furthermore, an efficient GPU implementation is presented with focus on the reduction of the computational burden introduced by the HWP scheme. Finally, the validation cases give a comprehensive overview of three SWE schemes and reveal their strengths and weaknesses under various conditions. We observe that the KP and HWP schemes are more accurate than the CN scheme in some cases, however, in other cases they suffer from non-physical oscillations. Overall, good agreement is observed for all case studies rendering the presented shallow water schemes suitable for flood management applications.9
    corecore