58 research outputs found

    Fast and accurate front propagation for simulation of geological folds

    Get PDF
    Front propagations described by static Hamilton-Jacobi equations can be used to simulate folded geological structures. Simulations of geological folds are a key ingredient in the Compound Earth Simulator (CES), an industrial software tool used in the exploration of oil and gas. In this thesis, local approximation techniques are investigated with respect to accuracy and efficiency. Several novel algorithms are also introduced, of which some are accelerated by parallel implementations on both multicore CPUs and Graphic Processing Units. These algorithms are able to simulate folds at a fraction of the time needed by the CES industry code, while retaining the same level of accuracy. Complicated tasks that previously needed several minutes to be computed can now be performed in just a matter of a few seconds, thus significantly improving the CES user experience

    High performance scientific computing in applications with direct finite element simulation

    Get PDF
    xiii, 133 p.La predicción del flujo separado, incluida la pérdida de un avión completo mediantela dinámica de fluidos computacional (CFD) se considera uno de los grandes desaf¿¿os que seresolverán en 2030, según NASA. Las ecuaciones no lineales de Navier-Stokes proporcionan laformulación matemática para flujo de fluidos en espacios tridimensionales. Sin embargo, todaviafaltan soluciones clásicas, existencia y singularidad. Ya que el cálculo de la fuerza bruta esintratable para realizar simulación predictiva para un avión completo, uno puede usar la simulaciónnumérica directa (DNS); sin embargo, prohibitivamente caro ya que necesita resolver laturbulencia a escala de magnitud Re power (9/4). Considerando otros métodos como el estad¿¿sticopromedio Reynolds¿s Average Navier Stokes (RANS), spatial average Large Eddy Simulation(LES), y Hybrid Detached Eddy Simulation (DES), que requieren menos cantidad de grados delibertad. Todos estos métodos deben ajustarse a los problemas de referencia y, además, cerca las paredes, la malla tieneque ser muy fina para resolver las capas l¿¿mite (lo cual significa que el costo computacional es muycostoso). Por encima de todo, los resultados son sensibles a, por ejemplo, parámetros expl¿¿citos enel método, la malla, etc.Como una solución al desaf¿¿o, aqu¿¿ presentamos la adaptación Metodolog¿¿a de solución directa deFEM (DFS) con resolución numérica disparo, como una familia predictiva, libre de parámetros demétodos para flujo turbulento. Resolvimos el modelo de avión JAXA Standard Model (JSM) ennúmero realista de Reynolds, presentado como parte del High Lift Taller de predicción 3.Predijimos un aumento de Cl dentro de un error de 5 % vs experimento, arrastre Cd dentro de 10 %error y detenga 1 ¿ dentro del ángulo de ataque.El taller identificó un probable experimento error depedido 10 % para los resultados de arrastre. La simulación es 10 veces más rápido y más barato encomparación con CFD tradicional o existente enfoques. La eficiencia proviene principalmente dell¿¿mite de deslizamiento condición que permite mallas gruesas cerca de las paredes, orientada aobjetivos control de error adaptativo que refina la malla solo donde es necesario y grandes pasos detiempo utilizando un método de iteración de punto fijo tipo Schur, sin comprometer la precisión delos resultados de la simulación.También presentamos una generalización de DFS a densidad variable y validado contra el problemade referencia MARIN bien establecido. los Los resultados muestran un buen acuerdo con losresultados experimentales en forma de sensores de presión. Más tarde, usamos esta metodolog¿¿apara resolver dos aplicaciones en problemas de flujo multifásico. Uno tiene que ver con un flashtanque de almacenamiento de agua de lluvia (consorcio de agua de Bilbao), y el segundo es sobre eldiseño de una boquilla para impresión 3D. En el agua de lluvia tanque de almacenamiento,predijimos que la altura del agua en el tanque tiene un influencia significativa sobre cómo secomporta el flujo aguas abajo de la puerta del tanque (válvula). Para la impresión 3D,desarrollamos un diseño eficiente con El flujo de chorro enfocado para evitar la oxidación y elcalentamiento en la punta del boquilla durante un proceso de fusión.Finalmente, presentamos aqu¿¿ el paralelismo en múltiples GPU y el incrustado sistema dearquitectura Kalray. Casi todas las supercomputadoras de hoy tienen arquitecturas heterogéneas,1 See the UNESCO Internacional Standard nomenclature for fields of Science and Technologyacomo CPU+GPU u otros aceleradores, y, por lo tanto, es esencial desarrollar marcoscomputacionales para aprovecha de ellos. Como lo hemos visto antes, se comienza a desarrollar eseCFD más tarde en la década de 1060 cuando podemos tener poder computacional, por lo tanto, Esesencial utilizar y probar estos aceleradores para los cálculos de CFD. Las GPU tienen unaarquitectura diferente en comparación con las CPU tradicionales. Técnicamente, la GPU tienemuchos núcleos en comparación con las CPU que hacen de la GPU una buena opción para elcómputo paralelo.Para múltiples GPU, desarrollamos un cálculo de plantilla, aplicado a simulación depliegues geológicos. Exploramos la computación de halo y utilizamos Secuencias CUDA paraoptimizar el tiempo de computación y comunicación. La ganancia de rendimiento resultante fue de23 % para cuatro GPU con arquitectura Fermi, y la mejora correspondiente obtenida en cuatro LasGPU Kepler fueron de 47 %.This research was carried out at the Basque Center for Applied Mathematics (BCAM) within the CFD Computational Technology (CFDCT) and also at the School of Electrical Engineering and Computer Science(Royal Institue of Technology, Stockholm, Sweden). Which is suported by Fundacion Obra Social “la Caixa“, Severo Ochoa Excellence research centre 2014-2018 SEV-2013-0323, Severo Ochoa Excellence research centre 2018-2022 SEV-2017-0718, BERC program 2014-2017, BERC program 2018-2021, MSO4SC European project, Elkartek. This work has been performed using the computing infrastructure from SNIC (Swedish National Infrastructure for Computing)

    High Performance Scientific Computing in Applications with Direct Finite Element Simulation

    Get PDF
    To predict separated flow including stall of a full aircraft with Computational Fluid Dynamics (CFD) is considered one of the problems of the grand challenges to be solved by 2030, according to NASA [1]. The nonlinear Navier- Stokes equations provide the mathematical formulation for fluid flow in 3- dimensional spaces. However, classical solutions, existence, and uniqueness are still missing. Since brute-force computation is intractable, to perform predictive simulation for a full aircraft, one can use Direct Numerical Simulation (DNS); however, it is prohibitively expensive as it needs to resolve the turbulent scales of order Re4 . Considering other methods such as statistical average Reynolds’s Average Navier Stokes (RANS), spatial average Large Eddy Simulation (LES), and hybrid Detached Eddy Simulation (DES), which require less number of degrees of freedom. All of these methods have to be tuned to benchmark problems, and moreover, near the walls, the mesh has to be very fine to resolve boundary layers (which means the computational cost is very expensive). Above all, the results are sensitive to, e.g. explicit parameters in the method, the mesh, etc. As a resolution to the challenge, here we present the adaptive time- resolved Direct FEM Solution (DFS) methodology with numerical tripping, as a predictive, parameter-free family of methods for turbulent flow. We solved the JAXA Standard Model (JSM) aircraft model at realistic Reynolds number, presented as part of the High Lift Prediction Workshop 3. We predicted lift Cl within 5% error vs. experiment, drag Cd within 10% error and stall 1◦ within the angle of attack. The workshop identified a likely experimental error of order 10% for the drag results. The simulation is 10 times faster and cheaper when compared to traditional or existing CFD approaches. The efficiency mainly comes from the slip boundary condition that allows coarse meshes near walls, goal-oriented adaptive error control that refines the mesh only where needed and large time steps using a Schur-type fixed-point iteration method, without compromising the accuracy of the simulation results. As a follow-up, we were invited to the Fifth High Order CFD Workshop, where the approach was validated for a tandem sphere problem (low Reynolds number turbulent flow) wherein a second sphere is placed a certain distance downstream from a first sphere. The results capture the expected slipstream phenomenon, with appx. 2% error. A comparison with the higher-order frameworks Nek500 and PyFR was done. The PyFR framework has demonstrated high effectiveness for GPUs with an unstructured mesh, which is a hard problem in this field. This is achieved by an explicit time-stepping approach. Our study showed that our large time step approach enabled appx. 3 orders of magnitude larger time steps than the explicit time steps in PyFR, which made our method more effective for solving the whole problem. We also presented a generalization of DFS to variable density and validated against the well-established MARIN benchmark problem. The results show good agreement with experimental results in the form of pressure sensors. Later, we used this methodology to solve two applications in multiphase flow problems. One has to do with a flash rainwater storage tank (Bilbao water consortium), and the second is about designing a nozzle for 3D printing. In the flash rainwater storage tank, we predicted that the water height in the tank has a significant influence on how the flow behaves downstream of the tank door (valve). For the 3D printing, we developed an efficient design with the focused jet flow to prevent oxidation and heating at the tip of the nozzle during a melting process. Finally, we presented here the parallelism on multiple GPUs and the embedded system Kalray architecture. Almost all supercomputers today have heterogeneous architectures, such as CPU+GPU or other accelerators, and it is, therefore, essential to develop computational frameworks to take advantage of them. For multiple GPUs, we developed a stencil computation, applied to geological folds simulation. We explored halo computation and used CUDA streams to optimize computation and communication time. The resulting performance gain was 23% for four GPUs with Fermi architecture, and the corresponding improvement obtained on four Kepler GPUs were 47%. The Kalray architecture is designed to have low energy consumption. Here we tested the Jacobi method with different communication strategies. Additionally, visualization is a crucial area when we do scientific simulations. We developed an automated visualization framework, where we could see that task parallelization is more than 10 times faster than data parallelization. We have also used our DFS in the cloud computing setting to validate the simulation against the local cluster simulation. Finally, we recommend the easy pre-processing tool to support DFS simulation.La Caixa 201

    High-Performance Fast Iterative Methods for Eikonal Equations

    Get PDF
    Department of Computer Science and EngineeringThe eikonal equation has a wide range of applications related to distances or travel time in space, such as geoscience, computer vision, image processing, path planning, and computer graphics. Recently, the research on eikonal equation solvers has focused more on developing efficient parallel algorithms to leverage the computing power of parallel systems, such as multi-core CPUs and graphics processing units (GPUs). However, only a little research literature exists for the massively parallel eikonal equation solver because of its complications related to data and work management. In this dissertation research, I introduce several-fold novel contributions to leverage the high-performance and massive computing platform for a parallel eikonal equation solver. First, I introduce a novel adaptive domain decomposition method for an efficient multi-GPU implementation of the block-based fast iterative method (FIM). The proposed method expands the sub-domain which is to be processed for each GPU by considering the fair load balancing as the iterative algorithm proceeds. It also provides a locality-aware clustering algorithm to minimize the communication overhead. With this, I solved the parallel performance problems that are often encountered in naive multi-GPU extensions that depend on regular domain decomposition, such as task load imbalance and high communication cost. In addition, it includes several optimization techniques, such as hiding the CPU cost using the CUDA multi-streams and hiding the data transfer costs between multiple GPUs. Second, I propose an efficient parallel implementation of FIM for a multi-core shared-memory system by using a lock-free local queue approach and provide an in-depth analysis of the parallel performance of the method. In addition, I propose a new parallel algorithm, Group-Ordered Fast Iterative Method (GO-FIM), that exploits the causality of grid blocks to reduce redundant computations, which was the main drawback of the original FIM. The proposed GO-FIM method uses the clustering of blocks based on the updating order where each cluster can be updated in parallel by using multi-core parallel architectures. Third, I propose a novel algorithm called Causality-Ordered Fast Iterative Method (CO-FIM), that exploits the causality dependency at a node level to reduce redundant computations. Moreover, I propose a new parallel algorithm, Causality and Group-Ordered Fast Iterative Method (CGOFIM), that integrates GO-FIM and CO-FIM. The proposed CGO-FIM determines the updating order at the block level while minimizing the redundancy calculation in the inner block by a node-level causality dependency. The CGO-FIM method has a condition for using both COFIM and FIM interchangeably in the inner block, and it is fully compatible with the lock-free local queue approach, so it can be efficiently implemented for multi-core parallel architectures.clos

    Detection and elimination of rock face vegetation from terrestrial LIDAR data using the virtual articulating conical probe algorithm

    Get PDF
    A common use of terrestrial lidar is to conduct studies involving change detection of natural or engineered surfaces. Change detection involves many technical steps beyond the initial data acquisition: data structuring, registration, and elimination of data artifacts such as parallax errors, near-field obstructions, and vegetation. Of these, vegetation detection and elimination with terrestrial lidar scanning (TLS) presents a completely different set of issues when compared to vegetation elimination from aerial lidar scanning (ALS). With ALS, the ground footprint of the lidar laser beam is very large, and the data acquisition hardware supports multi-return waveforms. Also, the underlying surface topography is relatively smooth compared to the overlying vegetation which has a high spatial frequency. On the other hand, with most TLS systems, the width of the lidar laser beam is very small, and the data acquisition hardware supports only first-return signals. For the case where vegetation is covering a rock face, the underlying rock surface is not smooth because rock joints and sharp block edges have a high spatial frequency very similar to the overlying vegetation. Traditional ALS approaches to eliminate vegetation take advantage of the contrast in spatial frequency between the underlying ground surface and the overlying vegetation. When the ALS approach is used on vegetated rock faces, the algorithm, as expected, eliminates the vegetation, but also digitally erodes the sharp corners of the underlying rock. A new method that analyzes the slope of a surface along with relative depth and contiguity information is proposed as a way of differentiating high spatial frequency vegetative cover from similar high spatial frequency rock surfaces. This method, named the Virtual Articulating Conical Probe (VACP) algorithm, offers a solution for detection and elimination of rock face vegetation from TLS point cloud data while not affecting the geometry of the underlying rock surface. Such a tool could prove invaluable to the geotechnical engineer for quantifying rates of vertical-face rock loss that impact civil infrastructure safety --Abstract, page iii

    Adaptive finite element simulation of fracture: from plastic deformation to crack propagation

    Get PDF
    As engineers and scientists, we have a host of reasons to understand how structural systems fail. We may be able to improve the safety of buildings during natural disaster by designing more fracture resistant connectors, to lengthen the life span on industrial machinery by designing it to sustain very large deformation at high temperatures, or prepare evacuation procedures for populated areas in high seismic zones in the event of rupture in the earth's crust. In order to achieve a better understanding of how any of these structures fail, experimental, theoretical, and computational advances must be made. In this dissertation we will focus on computational simulation by means of the finite element method and will investigate topological and physical aspects of adaptive remeshing for two types of structural systems: quasi-brittle and ductile. For ductile systems, we are interested in modeling the large deformations that occur before rupture of the material. The deformations can be so large that element distortion can cause lack of numerical convergence. Thus, we present a remeshing and internal state variable mapping technique to enable large deformation modeling and alleviate mesh distortion. We perform detailed studies on the Lie-group interpolation and variational recovery scheme and conclude that the approach results in very limited numerical diffusions and is applicable for modeling systems with significant ductile distortion. For quasi brittle systems mesh adaptivity is the central theme as it is for the work on ductile systems. We investigate two- and three-dimensional problems on CPU and GPU systems with the main goals of either improving computational efficiency or fidelity of the final solution. We investigate quasi-brittle fracture by means of the inter-element extrinsic cohesive zone model approach in which interface elements capable of separating are adaptively inserted at bulk element facets when and where they are needed throughout the numerical simulation. The inter-element cohesive zone model approach is known to suffer from mesh bias. Thus, we utilize polygonal element meshes with adaptive splitting to improve the capability of the mesh to represent experimentally obtained fracture patterns. The fact that we utilize the efficient linear polygonal elements and only apply the adaptive element splitting where needed means that we also achieve improved computational efficiency with this approach. In the last half of the dissertation, we depart from the use of unstructured meshes and focus on the development of hierarchical mesh refinement and coarsening schemes on the structured 4k mesh in two and three dimensions. In three-dimensions, the size of the problem increases so rapidly that mesh adaptivity is critical to enable the simulation of large-scale systems. Thus, we develop the topological and physical aspects of the mesh refinement and coarsening scheme. The scheme is rigorously tested on two benchmark problems; both of which shows significant speed up over a uniform mesh implementation and demonstrate physically meaningful results. To achieve greater speed up, the adaptive mesh refinement and coarsening scheme on the 2D 4k mesh is mapped to a GPU architecture. Considerations for the numerical implementation on the massively parallel system are detailed. Further, a study on the impact of the parallelization of the dynamic fracture code is performed on a benchmark problem, and a statistical investigation reveals the validity of the approach. Finally, the benchmark example is extended to such that the speicmen dimensions matches that of the original experimental system. The speedup provided by the GPU allows us to model this large system in a pratical amount of time and ultimately allows us to investigate differences between the commonly used reduced-scale model and the actual experimental scale. This dissertation concludes with a summary of contribution and comments on potential future research directions. Appendices featuring scripts and codes are also included for the interested reader

    Generalized averaged Gaussian quadrature and applications

    Get PDF
    A simple numerical method for constructing the optimal generalized averaged Gaussian quadrature formulas will be presented. These formulas exist in many cases in which real positive GaussKronrod formulas do not exist, and can be used as an adequate alternative in order to estimate the error of a Gaussian rule. We also investigate the conditions under which the optimal averaged Gaussian quadrature formulas and their truncated variants are internal

    MS FT-2-2 7 Orthogonal polynomials and quadrature: Theory, computation, and applications

    Get PDF
    Quadrature rules find many applications in science and engineering. Their analysis is a classical area of applied mathematics and continues to attract considerable attention. This seminar brings together speakers with expertise in a large variety of quadrature rules. It is the aim of the seminar to provide an overview of recent developments in the analysis of quadrature rules. The computation of error estimates and novel applications also are described
    corecore