982 research outputs found

    Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in hypre and PETSc

    Full text link
    We describe our software package Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) publicly released recently. BLOPEX is available as a stand-alone serial library, as an external package to PETSc (``Portable, Extensible Toolkit for Scientific Computation'', a general purpose suite of tools for the scalable solution of partial differential equations and related problems developed by Argonne National Laboratory), and is also built into {\it hypre} (``High Performance Preconditioners'', scalable linear solvers package developed by Lawrence Livermore National Laboratory). The present BLOPEX release includes only one solver--the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method for symmetric eigenvalue problems. {\it hypre} provides users with advanced high-quality parallel preconditioners for linear systems, in particular, with domain decomposition and multigrid preconditioners. With BLOPEX, the same preconditioners can now be efficiently used for symmetric eigenvalue problems. PETSc facilitates the integration of independently developed application modules with strict attention to component interoperability, and makes BLOPEX extremely easy to compile and use with preconditioners that are available via PETSc. We present the LOBPCG algorithm in BLOPEX for {\it hypre} and PETSc. We demonstrate numerically the scalability of BLOPEX by testing it on a number of distributed and shared memory parallel systems, including a Beowulf system, SUN Fire 880, an AMD dual-core Opteron workstation, and IBM BlueGene/L supercomputer, using PETSc domain decomposition and {\it hypre} multigrid preconditioning. We test BLOPEX on a model problem, the standard 7-point finite-difference approximation of the 3-D Laplacian, with the problem size in the range 105−10810^5-10^8.Comment: Submitted to SIAM Journal on Scientific Computin

    An adaptive Cartesian embedded boundary approach for fluid simulations of two- and three-dimensional low temperature plasma filaments in complex geometries

    Get PDF
    We review a scalable two- and three-dimensional computer code for low-temperature plasma simulations in multi-material complex geometries. Our approach is based on embedded boundary (EB) finite volume discretizations of the minimal fluid-plasma model on adaptive Cartesian grids, extended to also account for charging of insulating surfaces. We discuss the spatial and temporal discretization methods, and show that the resulting overall method is second order convergent, monotone, and conservative (for smooth solutions). Weak scalability with parallel efficiencies over 70\% are demonstrated up to 8192 cores and more than one billion cells. We then demonstrate the use of adaptive mesh refinement in multiple two- and three-dimensional simulation examples at modest cores counts. The examples include two-dimensional simulations of surface streamers along insulators with surface roughness; fully three-dimensional simulations of filaments in experimentally realizable pin-plane geometries, and three-dimensional simulations of positive plasma discharges in multi-material complex geometries. The largest computational example uses up to 800800 million mesh cells with billions of unknowns on 40964096 computing cores. Our use of computer-aided design (CAD) and constructive solid geometry (CSG) combined with capabilities for parallel computing offers possibilities for performing three-dimensional transient plasma-fluid simulations, also in multi-material complex geometries at moderate pressures and comparatively large scale.Comment: 40 pages, 21 figure

    Enhancing speed and scalability of the ParFlow simulation code

    Full text link
    Regional hydrology studies are often supported by high resolution simulations of subsurface flow that require expensive and extensive computations. Efficient usage of the latest high performance parallel computing systems becomes a necessity. The simulation software ParFlow has been demonstrated to meet this requirement and shown to have excellent solver scalability for up to 16,384 processes. In the present work we show that the code requires further enhancements in order to fully take advantage of current petascale machines. We identify ParFlow's way of parallelization of the computational mesh as a central bottleneck. We propose to reorganize this subsystem using fast mesh partition algorithms provided by the parallel adaptive mesh refinement library p4est. We realize this in a minimally invasive manner by modifying selected parts of the code to reinterpret the existing mesh data structures. We evaluate the scaling performance of the modified version of ParFlow, demonstrating good weak and strong scaling up to 458k cores of the Juqueen supercomputer, and test an example application at large scale.Comment: The final publication is available at link.springer.co

    Modelling a permanent magnet synchronous motor in FEniCSx for parallel high-performance simulations

    Get PDF
    © 2022 The Authors. Published by Elsevier B.V. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/There are concerns that the extreme requirements of heavy-duty vehicles and aviation will see them left behind in the electrification of the transport sector, becoming the most significant emitters of greenhouse gases. Engineers extensively use the finite element method to analyse and improve the performance of electric machines, but new highly scalable methods with a linear (or near) time complexity are required to make extreme-scale models viable. This paper introduces a three-dimensional permanent magnet synchronous motor model using FEniCSx, a finite element platform tailored for efficient computing and data handling at scale. The model demonstrates comparable magnetic flux density distributions to a verification model built in Ansys Maxwell with a maximum deviation of 7% in the motor’s static regions. Solving the largest mesh, comprising over eight million cells, displayed a speedup of 198 at 512 processes. A preconditioned Krylov subspace method was used to solve the system, requiring 92% less memory than a direct solution. It is expected that advances built on this approach will allow system-level multiphysics simulations to become feasible within electric machine development. This capability could provide the near real-world accuracy needed to bring electric propulsion systems to large vehicles.Peer reviewe
    • …
    corecore