11 research outputs found

    A scalable software framework for solving PDEs on distributed octree meshes using finite element methods

    Get PDF
    Tracking particle motion in inertial flows (especially in obstructed geometries) is a computationally daunting proposition. This is further complicated by that fact that the construction of migration maps for particles (as a function of particle location, flow conditions, and particle size) requires several thousands of simulations tracking individual particles. This calls for the development of an efficient, scalable approach for single particle tracking in fluids. We bring together three distinct elements to accomplish this: (a) a parallel octree based adaptive mesh generation framework, (b) a variational multiscale (VMS) based treatment that enables flow condition agnostic simulations (laminar or turbulent)~\cite{Bazilevs07b}, and (c) a variationally consistent immersed boundary method (IBM) to efficiently track moving particles in a background octree mesh~\cite{Xu:2015ig}. This project builds on our existing codes for adaptive meshing (\dendro) and finite elements (\talyfem). We present our adaptive meshing framework that is tailored for the immersed boundary method and experiments demonstrating the scalability of our code to over 1k compute nodes

    Low-constant parallel algorithms for finite element simulations using linear octrees

    No full text
    In this article we propose parallel algorithms for the construction of conforming finite-element discretization on linear octrees. Existing octree-based discretizations scale to billions of elements, but the complexity constants can be high. In our approach we use several techniques to minimize overhead: a novel bottom-up tree-construction and 2:1 balance constraint enforcement; a Golomb-Rice encoding for compression by representing the octree and element connectivity as an Uniquely Decodable Code (UDC); overlapping communication and computation; and byte alignment for cache efficiency. The cost of applying the Laplacian is comparable to that of applying it using a direct indexing regular grid discretization with the same number of elements. Our algorithm has scaled up to four billion octants on 4096 processors on a Cray XT3 at the Pittsburgh Supercomputing Center. The overall tree construction time is under a minute in contrast to previous implementations that required several minutes; the evaluation of the discretization of a variable-coefficient Laplacian takes only a few seconds. 1

    A generic finite element framework on parallel tree-based adaptive meshes

    Get PDF
    We present highly scalable parallel distributed-memory algorithms and associated data structures for a generic finite element framework that supports h-adaptivity on computational domains represented as multiple connected adaptive trees—forest-of-trees—, thus providing multi-scale resolution on problems governed by partial differential equations.The framework is grounded on a rich representation of the adaptive mesh suitable for generic finite elements that is built on top of a low-level, light-weight forest-oftrees data structure handled by a specialized, highly parallel adaptive meshing engine. Along the way, we have identified the requirements that the forest-of-trees layer must fulfill to be coupled into our framework. Essentially, it must be able to describe neighboring relationships between cells in the adapted mesh (apart from hierarchical relationships) across the lower-dimensional objects at the boundary of the cells. Atop this two-layered mesh representation, we build the rest of data structures required for the numerical integration and assembly of the discrete system of linear equations.We consider algorithms that are suitable for both subassembled and fully-assembled distributed data layouts of linear system matrices. The proposed framework has been implemented within the FEMPAR scientific software library, using p4est as a practical forest-of-octrees demonstrator. A comprehensive strong scaling study of this implementation when applied to Poisson and Maxwell problems reveals remarkable scalability up to 32.2K CPU cores and 482.2M degrees of freedom. Besides, the implementation in FEMPAR of the proposed approach is up to 2.6 and 3.4 times faster than the state-of-the-art deal.II finite element software in the h-adaptive approximation of a Poisson problem with firstand second-order Lagrangian finite elements, respectively (excluding the linear solver step from the comparison)

    Buoyancy-driven flow and fluid-structure interaction with moving boundaries

    Get PDF
    We deploy the residual-based variational multi-scale (VMS) method in the sense of large-eddy simulation (LES) in finite element method to buoyancy-driven flow in enclosures and consider an extensive range of Rayleigh number from laminar (10310^3) to turbulent (101010^{10}) in a 2D benchmark Rayleigh--B\\u27enard problem. 3D simulations for a laminar and a turbulent case are performed and comparisons including mean profiles as well as fluctuation profiles with other numerical and experimental results are successfully carried out. A weakly imposed boundary conditions method is employed for both velocity and temperature, and it produces reasonable results with a much coarser mesh compared with the traditional imposition of boundary conditions. This suggests that the VMS framework with the weak imposition of boundary conditions is a computationally efficient approach to model buoyancy-driven flows in complex indoor environments. In addition to the flow fields, we deploy the immersogeometric analysis (IMGA) method in the sense of the immersed boundary method (IBM) for objects moving in fluids onto an unstructured framework. The finite element formulation is stabilized by the VMS method in an unstructured background mesh. Weak imposition of boundary conditions is used to impose no-slip boundary condition on the immersed boundary. Adaptively refined quadrature rules are used to better capture the geometry of the immersed boundary and accurately integrate the background elements that intersect the immersed boundary. Treatment for the freshly-cleared nodes is considered. We assess the accuracy of the moving IMGA framework by analyzing object motion in a variety of flow structures, including freely dropping cylinder/sphere in viscous fluids and particle focusing in (un)obstructed channels. We show the quantities of interests are in good agreements with other analytical, numerical and experimental solutions. Advantages of this moving IMGA framework in computational cost and efficiency are indicated by the comparison with the body-fitted method using a commercial computational fluid dynamic (CFD) software. The framework of moving IMGA is capable to be deployed in applications of particle control and manipulation in microfluidic channels. The moving IMGA on the unstructured framework is further deployed to a scalable, adaptively refined, octree-based finite element approach for a better computational performance to track object motion. This enables using a parallel, hierarchically refined octree mesh as the background mesh, with a variationally consistent IMGA formulation on this background mesh. We integrate the unstructured framework of moving IMGA to the octree-based framework. We show good scaling results of the coupled framework on Stampede2, TACC. This illustrates the potential of the moving IMGA on the coupled framework to efficiently track complex particles in flows

    Die Finite-Elemente-Methode mit dynamisch-adaptiven kartesischen Gittern

    Get PDF
    In dieser Arbeit wird ein zweidimensionales Str枚mungsproblem, beschrieben durch die Navier-Stokes-Gleichungen, auf einem dynamisch adaptiven Gitter mithilfe der Finite-Elemente-Methode berechnet. Es wird der komplette Ablauf der Berechnung anhand einer Implementierung vorgestellt. Als Datenstruktur werden Quadtrees verwendet, die mit einem bottom-up-Algorithmus nach Sundar et al. parallel erzeugt werden k枚nnen. Basierend auf der Vorticity wird das Gitter w盲hrend der Simulation verfeinert oder vergr枚bert. Es wird die parallele Skalierbarkeit untersucht und f眉r ein regul盲res Gitter ein Laufzeitvergleich mit einer Referenzimplementierung ohne Quadtrees durchgef眉hrt

    X10 for high-performance scientific computing

    No full text
    High performance computing is a key technology that enables large-scale physical simulation in modern science. While great advances have been made in methods and algorithms for scientific computing, the most commonly used programming models encourage a fragmented view of computation that maps poorly to the underlying computer architecture. Scientific applications typically manifest physical locality, which means that interactions between entities or events that are nearby in space or time are stronger than more distant interactions. Linear-scaling methods exploit physical locality by approximating distant interactions, to reduce computational complexity so that cost is proportional to system size. In these methods, the computation required for each portion of the system is different depending on that portion鈥檚 contribution to the overall result. To support productive development, application programmers need programming models that cleanly map aspects of the physical system being simulated to the underlying computer architecture while also supporting the irregular workloads that arise from the fragmentation of a physical system. X10 is a new programming language for high-performance computing that uses the asynchronous partitioned global address space (APGAS) model, which combines explicit representation of locality with asynchronous task parallelism. This thesis argues that the X10 language is well suited to expressing the algorithmic properties of locality and irregular parallelism that are common to many methods for physical simulation. The work reported in this thesis was part of a co-design effort involving researchers at IBM and ANU in which two significant computational chemistry codes were developed in X10, with an aim to improve the expressiveness and performance of the language. The first is a Hartree鈥揊ock electronic structure code, implemented using the novel Resolution of the Coulomb Operator approach. The second evaluates electrostatic interactions between point charges, using either the smooth particle mesh Ewald method or the fast multipole method, with the latter used to simulate ion interactions in a Fourier Transform Ion Cyclotron Resonance mass spectrometer. We compare the performance of both X10 applications to state-of-the-art software packages written in other languages. This thesis presents improvements to the X10 language and runtime libraries for managing and visualizing the data locality of parallel tasks, communication using active messages, and efficient implementation of distributed arrays. We evaluate these improvements in the context of computational chemistry application examples. This work demonstrates that X10 can achieve performance comparable to established programming languages when running on a single core. More importantly, X10 programs can achieve high parallel efficiency on a multithreaded architecture, given a divide-and-conquer pattern parallel tasks and appropriate use of worker-local data. For distributed memory architectures, X10 supports the use of active messages to construct local, asynchronous communication patterns which outperform global, synchronous patterns. Although point-to-point active messages may be implemented efficiently, productive application development also requires collective communications; more work is required to integrate both forms of communication in the X10 language. The exploitation of locality is the key insight in both linear-scaling methods and the APGAS programming model; their combination represents an attractive opportunity for future co-design efforts

    Large-scale tree-based unfitted finite elements for metal additive manufacturing

    Get PDF
    This thesis addresses large-scale numerical simulations of partial differential equations posed on evolving geometries. Our target application is the simulation of metal additive manufacturing (or 3D printing) with powder-bed fusion methods, such as Selective Laser Melting (SLM), Direct Metal Laser Sintering (DMLS) or Electron-Beam Melting (EBM). The simulation of metal additive manufacturing processes is a remarkable computational challenge, because processes are characterised by multiple scales in space and time and multiple complex physics that occur in intricate three-dimensional growing-in-time geometries. Only the synergy of advanced numerical algorithms and high-performance scientific computing tools can fully resolve, in the short run, the simulation needs in the area. The main goal of this Thesis is to design a a novel highly-scalable numerical framework with multi-resolution capability in arbitrarily complex evolving geometries. To this end, the framework is built by combining three computational tools: (1) parallel mesh generation and adaptation with forest-of-trees meshes, (2) robust unfitted finite element methods and (3) parallel finite element modelling of the geometry evolution in time. Our numerical research is driven by several limitations and open questions in the state-of-the-art of the three aforementioned areas, which are vital to achieve our main objective. All our developments are deployed with high-end distributed-memory implementations in the large-scale open-source software project FEMPAR. In considering our target application, (4) temporal and spatial model reduction strategies for thermal finite element models are investigated. They are coupled to our new large-scale computational framework to simplify optimisation of the manufacturing process. The contributions of this Thesis span the four ingredients above. Current understanding of (1) is substantially improved with rigorous proofs of the computational benefits of the 2:1 k-balance (ease of parallel implementation and high-scalability) and the minimum requirements a parallel tree-based mesh must fulfil to yield correct parallel finite element solvers atop them. Concerning (2), a robust, optimal and scalable formulation of the aggregated unfitted finite element method is proposed on parallel tree-based meshes for elliptic problems with unfitted external contour or unfitted interfaces. To the author鈥檚 best knowledge, this marks the first time techniques (1) and (2) are brought together. After enhancing (1)+(2) with a novel parallel approach for (3), the resulting framework is able to mitigate a major performance bottleneck in large-scale simulations of metal additive manufacturing processes by powder-bed fusion: scalable adaptive (re)meshing in arbitrarily complex geometries that grow in time. Along the development of this Thesis, our application problem (4) is investigated in two joint collaborations with the Monash Centre for Additive Manufacturing and Monash University in Melbourne, Australia. The first contribution is an experimentally-supported thorough numerical assessment of time-lumping methods, the second one is a novel experimentally-validated formulation of a new physics-based thermal contact model, accounting for thermal inertia and suitable for model localisation, the so-called virtual domain approximation. By efficiently exploiting high-performance computing resources, our new computational framework enables large-scale finite element analysis of metal additive manufacturing processes, with increased fidelity of predictions and dramatical reductions of computing times. It can also be combined with the proposed model reductions for fast thermal optimisation of the manufacturing process. These tools open the path to accelerate the understanding of the process-to-performance link and digital product design and certification in metal additive manufacturing, two milestones that are vital to exploit the technology for mass-production.Aquesta tesi tracta la simulaci贸 a gran escala d'equacions en derivades parcials sobre geometries variables. L'aplicaci贸 principal 茅s la simulaci贸 de procesos de fabricaci贸 additiva (o impressi贸 3D) amb metalls i per m猫todes de fusi贸 de llit de pols, com ara Selective Laser Melting (SLM), Direct Metal Laser Sintering (DMLS) o Electron-Beam Melting (EBM). La simulaci贸 d'aquests processos 茅s un repte computacional excepcional, perqu猫 els processos estan caracteritzats per m煤ltiples escales espaitemporals i m煤ltiples f铆siques que tenen lloc sobre geometries tridimensionals complicades que creixen en el temps. La sin猫rgia entre algorismes num猫rics avan莽ats i eines de computaci贸 cient铆fica d'alt rendiment 茅s la 煤nica via per resoldre completament i a curt termini les necessitats en simulaci贸 d'aquesta 脿rea. El principal objectiu d'aquesta tesi 茅s dissenyar un nou marc num猫ric escalable de simulaci贸 amb capacitat de multiresoluci贸 en geometries complexes i variables. El nou marc es construeix unint tres eines computacionals: (1) mallat paral路lel i adaptatiu amb malles de boscs d'arbre, (2) m猫todes d'elements finits immersos robustos i (3) modelitzaci贸 en paral路lel amb elements finits de geometries que creixen en el temps. Algunes limitacions i problemes oberts en l'estat de l'art, que s贸n claus per aconseguir el nostre objectiu, guien la nostra recerca. Tots els desenvolupaments s'implementen en arquitectures de mem貌ria distribu茂da amb el programari d'acc茅s obert FEMPAR. Quant al problema d'aplicaci贸, (4) s'investiguen models redu茂ts en espai i temps per models t猫rmics del proc茅s. Aquests models redu茂ts s'acoplen al nostre marc computacional per simplificar l'optimitzaci贸 del proc茅s. Les contribucions d'aquesta tesi abasten els quatre punts de dalt. L'estat de l'art de (1) es millora substancialment amb proves riguroses dels beneficis computacionals del 2:1 balancejat (f脿cil paral路lelitzaci贸 i alta escalabilitat), aix铆 com dels requisits m铆nims que aquest tipus de mallat han de complir per garantir que els espais d'elements finits que s'hi defineixin estiguin ben posats. Quant a (2), s'ha formulat un m猫tode robust, 貌ptim i escalable per agregaci贸 per problemes el路l铆ptics amb contorn o interface immerses. Despr茅s d'augmentar (1)+(2) amb un nova estrat猫gia paral路lela per (3), el marc de simulaci贸 resultant mitiga de manera efectiva el principal coll d'ampolla en la simulaci贸 de processos de fabricaci贸 additiva en llits de pols de metall: adaptivitat i remallat escalable en geometries complexes que creixen en el temps. Durant el desenvolupament de la tesi, es col路labora amb el Monash Centre for Additive Manufacturing i la Universitat de Monash de Melbourne, Austr脿lia, per investigar el problema d'aplicaci贸. En primer lloc, es fa una an脿lisi experimental i num猫rica exhaustiva dels m猫todes d'aggregaci贸 temporal. En segon lloc, es proposa i valida experimental una nova formulaci贸 de contacte t猫rmic que t茅 en compte la in猫rcia t猫rmica i 茅s adequat per a localitzar el model, l'anomenada aproximaci贸 per dominis virtuals. Mitjan莽ant l'煤s eficient de recursos computacionals d'alt rendiment, el nostre nou marc computacional fa possible l'an脿lisi d'elements finits a gran escala dels processos de fabricaci贸 additiva amb metalls, amb augment de la fidelitat de les prediccions i reduccions significatives de temps de computaci贸. Aix铆 mateix, es pot combinar amb els models redu茂ts que es proposen per l'optimitzaci贸 t猫rmica del proc茅s de fabricaci贸. Aquestes eines contribueixen a accelerar la comprensi贸 del lligam proc茅s-rendiment i la digitalitzaci贸 del disseny i certificaci贸 de productes en fabricaci贸 additiva per metalls, dues fites crucials per explotar la tecnologia en producci贸 en massa.Postprint (published version
    corecore