Search CORE

982 research outputs found

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in hypre and PETSc

Author: Argentati M.E.
Argentati M.E.
Knyazev A.V.
Knyazev A.V.
Lashuk I.
Lashuk I.
Ovtchinnikov E.
Ovtchinnikov E.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 17/05/2007
Field of study

We describe our software package Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) publicly released recently. BLOPEX is available as a stand-alone serial library, as an external package to PETSc (``Portable, Extensible Toolkit for Scientific Computation'', a general purpose suite of tools for the scalable solution of partial differential equations and related problems developed by Argonne National Laboratory), and is also built into {\it hypre} (``High Performance Preconditioners'', scalable linear solvers package developed by Lawrence Livermore National Laboratory). The present BLOPEX release includes only one solver--the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method for symmetric eigenvalue problems. {\it hypre} provides users with advanced high-quality parallel preconditioners for linear systems, in particular, with domain decomposition and multigrid preconditioners. With BLOPEX, the same preconditioners can now be efficiently used for symmetric eigenvalue problems. PETSc facilitates the integration of independently developed application modules with strict attention to component interoperability, and makes BLOPEX extremely easy to compile and use with preconditioners that are available via PETSc. We present the LOBPCG algorithm in BLOPEX for {\it hypre} and PETSc. We demonstrate numerically the scalability of BLOPEX by testing it on a number of distributed and shared memory parallel systems, including a Beowulf system, SUN Fire 880, an AMD dual-core Opteron workstation, and IBM BlueGene/L supercomputer, using PETSc domain decomposition and {\it hypre} multigrid preconditioning. We test BLOPEX on a model problem, the standard 7-point finite-difference approximation of the 3-D Laplacian, with the problem size in the range

10^5-10^8

.Comment: Submitted to SIAM Journal on Scientific Computin

arXiv.org e-Print Archive

WestminsterResearch

An adaptive Cartesian embedded boundary approach for fluid simulations of two- and three-dimensional low temperature plasma filaments in complex geometries

Author: Marskar Robert
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

We review a scalable two- and three-dimensional computer code for low-temperature plasma simulations in multi-material complex geometries. Our approach is based on embedded boundary (EB) finite volume discretizations of the minimal fluid-plasma model on adaptive Cartesian grids, extended to also account for charging of insulating surfaces. We discuss the spatial and temporal discretization methods, and show that the resulting overall method is second order convergent, monotone, and conservative (for smooth solutions). Weak scalability with parallel efficiencies over 70\% are demonstrated up to 8192 cores and more than one billion cells. We then demonstrate the use of adaptive mesh refinement in multiple two- and three-dimensional simulation examples at modest cores counts. The examples include two-dimensional simulations of surface streamers along insulators with surface roughness; fully three-dimensional simulations of filaments in experimentally realizable pin-plane geometries, and three-dimensional simulations of positive plasma discharges in multi-material complex geometries. The largest computational example uses up to

800

million mesh cells with billions of unknowns on

4096

computing cores. Our use of computer-aided design (CAD) and constructive solid geometry (CSG) combined with capabilities for parallel computing offers possibilities for performing three-dimensional transient plasma-fluid simulations, also in multi-material complex geometries at moderate pressures and comparatively large scale.Comment: 40 pages, 21 figure

arXiv.org e-Print Archive

SINTEF Open

Enhancing speed and scalability of the ParFlow simulation code

Author: Burstedde Carsten
Fonseca Jose A.
Kollet Stefan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/09/2017
Field of study

Regional hydrology studies are often supported by high resolution simulations of subsurface flow that require expensive and extensive computations. Efficient usage of the latest high performance parallel computing systems becomes a necessity. The simulation software ParFlow has been demonstrated to meet this requirement and shown to have excellent solver scalability for up to 16,384 processes. In the present work we show that the code requires further enhancements in order to fully take advantage of current petascale machines. We identify ParFlow's way of parallelization of the computational mesh as a central bottleneck. We propose to reorganize this subsystem using fast mesh partition algorithms provided by the parallel adaptive mesh refinement library p4est. We realize this in a minimally invasive manner by modifying selected parts of the code to reinterpret the existing mesh data structures. We evaluate the scaling performance of the modified version of ParFlow, demonstrating good weak and strong scaling up to 458k cores of the Juqueen supercomputer, and test an example application at large scale.Comment: The final publication is available at link.springer.co

arXiv.org e-Print Archive

Crossref

Juelich Shared Electronic Resources

Modelling a permanent magnet synchronous motor in FEniCSx for parallel high-performance simulations

Author: Cherukunnath Neeraj
Dimov Nikolay
McDonagh James
Palumbo Nunzio
Yousif Nada
Publication venue
Publication date: 01/07/2022
Field of study

© 2022 The Authors. Published by Elsevier B.V. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/There are concerns that the extreme requirements of heavy-duty vehicles and aviation will see them left behind in the electrification of the transport sector, becoming the most significant emitters of greenhouse gases. Engineers extensively use the finite element method to analyse and improve the performance of electric machines, but new highly scalable methods with a linear (or near) time complexity are required to make extreme-scale models viable. This paper introduces a three-dimensional permanent magnet synchronous motor model using FEniCSx, a finite element platform tailored for efficient computing and data handling at scale. The model demonstrates comparable magnetic flux density distributions to a verification model built in Ansys Maxwell with a maximum deviation of 7% in the motor’s static regions. Solving the largest mesh, comprising over eight million cells, displayed a speedup of 198 at 512 processes. A preconditioned Krylov subspace method was used to solve the system, requiring 92% less memory than a direct solution. It is expected that advances built on this approach will allow system-level multiphysics simulations to become feasible within electric machine development. This capability could provide the near real-world accuracy needed to bring electric propulsion systems to large vehicles.Peer reviewe

University of Hertfordshire Research Archive