29 research outputs found

    Multi-GPU acceleration of large-scale density-based topology optimization

    Get PDF
    This work presents a parallel implementation of density-based topology optimization using distributed GPU computing systems. The use of multiple GPU devices allows us accelerating the computing process and increasing the device memory available for GPU computing. This increment of device memory enables us to address large models that commonly do not fit into one GPU device. The most modern scientific computers incorporate these devices to design energy-efficient, low-cost, and high-computing power systems. However, we should adopt the proper techniques to take advantage of the computational resources of such high-performance many-core computing systems. It is well-known that the bottleneck of density-based topology optimization is the solving of the linear elasticity problem using Finite Element Analysis (FEA) during the topology optimization iterations. We solve the linear system of equations obtained from FEA using a distributed conjugate gradient solver preconditioned by a smooth aggregation-based algebraic multigrid (AMG) using GPU computing with multiple devices. The use of aggregation-based AMG reduces memory requirements and improves the efficiency of the interpolation operation. This fact is rewarding for GPU computing. We evaluate the performance and scalability of the distributed GPU system using structured and unstructured meshes. We also test the performance using different 3D finite elements and relaxing operators. Besides, we evaluate the use of numerical approaches to increase the topology optimization performance. Finally, we present a comparison between the many-core computing instance and one efficient multi-core implementation to highlight the advantages of using GPU computing in large-scale density-based topology optimization problems.This work has been supported by the AEI/FEDER and UE under the contract DPI2016-77538-R, and by the “Fundación Séneca – Agencia de Ciencia y Tecnología de la Región de Murcia” of Spain under the contract 20911/PI/18

    Performance Portable Solid Mechanics via Matrix-Free pp-Multigrid

    Full text link
    Finite element analysis of solid mechanics is a foundational tool of modern engineering, with low-order finite element methods and assembled sparse matrices representing the industry standard for implicit analysis. We use performance models and numerical experiments to demonstrate that high-order methods greatly reduce the costs to reach engineering tolerances while enabling effective use of GPUs. We demonstrate the reliability, efficiency, and scalability of matrix-free pp-multigrid methods with algebraic multigrid coarse solvers through large deformation hyperelastic simulations of multiscale structures. We investigate accuracy, cost, and execution time on multi-node CPU and GPU systems for moderate to large models using AMD MI250X (OLCF Crusher), NVIDIA A100 (NERSC Perlmutter), and V100 (LLNL Lassen and OLCF Summit), resulting in order of magnitude efficiency improvements over a broad range of model properties and scales. We discuss efficient matrix-free representation of Jacobians and demonstrate how automatic differentiation enables rapid development of nonlinear material models without impacting debuggability and workflows targeting GPUs

    Physically Based Forehead Modelling and Animation including Wrinkles

    Get PDF
    There has been a vast amount of research on the production of realistic facial models and animations, which is one of the most challenging areas of computer graphics. Recently, there has been an increased interest in the use of physically based approaches for facial animation, whereby the effects of muscle contractions are propagated through facial soft-tissue models to automatically deform them in a more realistic and anatomically accurate manner. Presented in this thesis is a fully physically based approach for efficiently producing realistic-looking animations of facial movement, including animation of expressive wrinkles, focussing on the forehead. This is done by modelling more physics-based behaviour than current computer graphics approaches. The presented research has two major components. The first is a novel model creation process to automatically create animatable non-conforming hexahedral finite element (FE) simulation models of facial soft tissue from any surface mesh that contains hole-free volumes. The generated multi-layered voxel-based models are immediately ready for simulation, with skin layers and element material properties, muscle properties, and boundary conditions being automatically computed. The second major component is an advanced optimised GPU-based process to simulate and visualise these models over time using the total Lagrangian explicit dynamic (TLED) formulation of the FE method. An anatomical muscle contraction model computes active and transversely isotropic passive muscle stresses, while advanced boundary conditions enable the sliding effect between the superficial and deep soft-tissue layers to be simulated. Soft-tissue models and animations with varying complexity are presented, from a simple soft-tissue-block model with uniform layers of skin and muscle, to a complex forehead model. These demonstrate the flexibility of the animation approach to produce detailed animations of realistic gross- and fine-scale soft-tissue movement, including wrinkles, with different muscle structures and material parameters, for example, to animate different-aged skin. Owing to the detail and accuracy of the models and simulations, the animation approach could also be used for applications outside of computer graphics, such as surgical applications. Furthermore, the animation approach can be used to animate any multi-layered soft body (not just soft tissue)

    Interactively Cutting and Constraining Vertices in Meshes Using Augmented Matrices

    Get PDF
    We present a finite-element solution method that is well suited for interactive simulations of cutting meshes in the regime of linear elastic models. Our approach features fast updates to the solution of the stiffness system of equations to account for real-time changes in mesh connectivity and boundary conditions. Updates are accomplished by augmenting the stiffness matrix to keep it consistent with changes to the underlying model, without refactoring the matrix at each step of cutting. The initial stiffness matrix and its Cholesky factors are used to implicitly form and solve a Schur complement system using an iterative solver. As changes accumulate over many simulation timesteps, the augmented solution method slows down due to the size of the augmented matrix. However, by periodically refactoring the stiffness matrix in a concurrent background process, fresh Cholesky factors that incorporate recent model changes can replace the initial factors. This controls the size of the augmented matrices and provides a way to maintain a fast solution rate as the number of changes to a model grows. We exploit sparsity in the stiffness matrix, the right-hand-side vectors and the solution vectors to compute the solutions fast, and show that the time complexity of the update steps is bounded linearly by the size of the Cholesky factor of the initial matrix. Our complexity analysis and experimental results demonstrate that this approach scales well with problem size. Results for cutting and deformation of 3D linear elastic models are reported for meshes representing the brain, eye, and model problems with element counts up to 167,000; these show the potential of this method for real-time interactivity. An application to limbal incisions for surgical correction of astigmatism, for which linear elastic models and small deformations are sufficient, is included

    Dynamic analysis of a needle insertion for soft materials: Arbitrary Lagrangian-Eulerian-based three-dimensional finite element analysis

    Get PDF
    Background: Our goal was to develop a three-dimensional finite element model that enables dynamic analysis of needle insertion for soft materials. To demonstrate large deformation and fracture, we used the arbitrary Lagrangian-Eulerian (ALE) method for fluid analysis. We performed ALE-based finite element analysis for 3% agar gel and three types of copper needle with bevel tips. Methods: To evaluate simulation results, we compared the needle deflection and insertion force with corresponding experimental results acquired with a uniaxial manipulator. We studied the shear stress distribution of agar gel on various time scales. Results: For 30°, 45°, and 60°, differences in deflections of each needle between both sets of results were 2.424, 2.981, and 3.737. mm, respectively. For the insertion force, there was no significant difference for mismatching area error (p<0.05) between simulation and experimental results. Conclusions: Our results have the potential to be a stepping stone to develop pre-operative surgical planning to estimate an optimal needle insertion path for MR image-guided microwave coagulation therapy and for analyzing large deformation and fracture in biological tissues. © 2014 Elsevier Ltd.Yamaguchi S., Tsutsui K., Satake K., et al. Dynamic analysis of a needle insertion for soft materials: Arbitrary Lagrangian-Eulerian-based three-dimensional finite element analysis. Computers in Biology and Medicine 53, 42 (2014); https://doi.org/10.1016/j.compbiomed.2014.07.012

    Simit: A Language for Physical Simulation

    Get PDF
    Using existing programming tools, writing high-performance simulation code is labor intensive and requires sacrificing readability and portability. The alternative is to prototype simulations in a high-level language like Matlab, thereby sacrificing performance. The Matlab programming model naturally describes the behavior of an entire physical system using the language of linear algebra. However, simulations also manipulate individual geometric elements, which are best represented using linked data structures like meshes. Translating between the linked data structures and linear algebra comes at significant cost, both to the programmer and the machine. High-performance implementations avoid the cost by rephrasing the computation in terms of linked or index data structures, leaving the code complicated and monolithic, often increasing its size by an order of magnitude. In this paper, we present Simit, a new language for physical simulations that lets the programmer view the system both as a linked data structure in the form of a hypergraph, and as a set of global vectors, matrices and tensors depending on what is convenient at any given time. Simit provides a novel assembly construct that makes it conceptually easy and computationally efficient to move between the two abstractions. Using the information provided by the assembly construct, the compiler generates efficient in-place computation on the graph. We demonstrate that Simit is easy to use: a Simit program is typically shorter than a Matlab program; that it is high-performance: a Simit program running sequentially on a CPU performs comparably to hand-optimized simulations; and that it is portable: Simit programs can be compiled for GPUs with no change to the program, delivering 5-25x speedups over our optimized CPU code

    High performance simulation of drug release model and mass transport model by using hybrid platform

    Get PDF
    The controlled drug delivery in drug eluting stents exerts an important influence in decreasing restenosis in intravascular stenting. These stents are coated with drug to avoid the re-narrowing of the arterial wall. The drug is directly associated with the original bare metal stents. Drug eluting stents have plus point of a flexible time delivery of a curative drug to the neighboring arterial tissue. It treats the required injuries efficiently having negligible systemic drug interaction. This thesis aims to develop a mathematical model for describing the procedure of drug distribution from stent coating and from arterial wall. For this purpose, a mathematical model of two phase is presented to simulate the transportation of drug between coating and arterial tissue. This two-phase model explores the impact of non-dimensional parameters such as solid liquid mass transfer rate , ratio of accessible void volume to solid volume and Peclet number on drug release and mass concentrations from coating and tissue layers. For better understanding a 2D mathematical model of biodurable stent coating is developed, where the intravascular distribution of drug from an implanted drug eluting stent in arterial wall is simulated. The model integrates reversible drug binding and diffusion of drug in the stent coating. The arterial wall and coating drug diffusivities are examined for the impact of arterial drug uptake and drug release in the coating. The diffusion coefficient of drug , the diffusion coefficients of wall , , and strut embedment play an important role to regulate the drug release. Moreover, a 3D model of mass concentrations and drug release from the cross section of artery is investigated. The impact of advective and diffusive velocities is explored and these forces can be used to control the mass concentrations of drug. FEM and FDM is used for spatial and temporal discretization of model equations. The sequential and parallel algorithms are developed for numerical simulations. Furthermore, the motivation for using GPU accelerators with CUDA is explained to handle computational complexities. A hybrid CPU/GPU algorithm for the proposed models is designed and satisfactory results for parallel performance indicators such as; speedups Sp, temporal performance Tp, efficiency Ep and effectiveness Fp are obtained. The CN method gives better sequential results because it has less RMSE than GD and BD methods. However, the BD method gives good results for parallel indicators because it involves less computation than GD and CN methods. The sequential and parallel performance of BM method is better as compared to NM and PM methods. The BM method has least RMSE for both sequential and parallel algorithms. The parallel performance indicators Sp, Tp, Ep and Fp for BM method gives better performance than the other methods. Therefore, it is a superior method than the NM and PM methods. Hybrid algorithms are more efficient in large-scale problem simulations as shown in parallel performance results. The governing models in this research provide the basis of a design tool for studying and calculating drug distribution in coating and arterial wall in the application of stent-based drug delivery. The models propose in this research are used for monitoring purpose and to determine drug release, mass transport, visualization and observation. The simulations support to offer a good perception into the potential effects of different parameters such as γ1, e1, Pe, Dc, Dw, Dwx, Dwy and strut embedment can affect the efficiency of drug release

    Abstractions and performance optimisations for finite element methods

    Get PDF
    Finding numerical solutions to partial differential equations (PDEs) is an essential task in the discipline of scientific computing. In designing software tools for this task, one of the ultimate goals is to balance the needs for generality, ease to use and high performance. Domain-specific systems based on code generation techniques, such as Firedrake, attempt to address this problem with a design consisting of a hierarchy of abstractions, where the users can specify the mathematical problems via a high-level, descriptive interface, which is progressively lowered through the intermediate abstractions. Well-designed abstraction layers are essential to enable performing code transformations and optimisations robustly and efficiently, generating high-performance code without user intervention. This thesis discusses several topics on the design of the abstraction layers of Firedrake, and presents the benefit of its software architecture by providing examples of various optimising code transformations at the appropriate abstraction layers. In particular, we discuss the advantage of describing the local assembly stage of a finite element solver in an intermediate representation based on symbolic tensor algebra. We successfully lift specific loop optimisations, previously implemented by rewriting ASTs of the local assembly kernels, to this higher-level tensor language, improving the compilation speed and optimisation effectiveness. The global assembly phase involves the application of local assembly kernels on a collection of entities of an unstructured mesh. We redesign the abstraction to express the global assembly loop nests using tools and concepts based on the polyhedral model. This enables us to implement the cross-element vectorisation algorithm that delivers stable vectorisation performance on CPUs automatically. This abstraction also improves the portability of Firedrake, as we demonstrate targeting GPU devices transparently from the same software stack.Open Acces

    Efficient parallelization strategy for real-time FE simulations

    Full text link
    This paper introduces an efficient and generic framework for finite-element simulations under an implicit time integration scheme. Being compatible with generic constitutive models, a fast matrix assembly method exploits the fact that system matrices are created in a deterministic way as long as the mesh topology remains constant. Using the sparsity pattern of the assembled system brings about significant optimizations on the assembly stage. As a result, developed techniques of GPU-based parallelization can be directly applied with the assembled system. Moreover, an asynchronous Cholesky precondition scheme is used to improve the convergence of the system solver. On this basis, a GPU-based Cholesky preconditioner is developed, significantly reducing the data transfer between the CPU/GPU during the solving stage. We evaluate the performance of our method with different mesh elements and hyperelastic models and compare it with typical approaches on the CPU and the GPU
    corecore