46 research outputs found

    nSharma: Numerical Simulation Heterogeneity Aware Runtime Manager for OpenFOAM

    Get PDF
    CFD simulations are a fundamental engineering application,implying huge workloads, often with dynamic behaviour due to run-time mesh refinement. Parallel processing over heterogeneous distributedmemory clusters is often used to process such workloads. The executionof dynamic workloads over a set of heterogeneous resources leads to loadimbalances that severely impacts execution time, when static uniformload distribution is used. This paper proposes applying dynamic, het-erogeneity aware, load balancing techniques within CFD simulations.nSharma, a software package that fully integrates with OpenFOAM, ispresented and assessed. Performance gains are demonstrated, achievedby reducing busy times standard deviation among resources, i.e. hetero-geneous computing resources are kept busy with useful work due to aneffective workload distribution. To best of authors’ knowledge, nSharmais the first implementation and integration of heterogeneity aware loadbalancing in OpenFOAM and will be made publicly available in order tofoster its adoption by the large community of OpenFOAM users.The authors would like to thank the financial funding by FEDER through the COMPETE 2020 Program, the National Funds through FCT under the projects UID/CTM/50025/2013. The first author was partially funded by the PT-FLAD Chair on Smart Cities & Smart Governance and also by the School of Engineering, University of Minho within project Performance Portability on Scalable Heterogeneous Computing Systems. The authors also wish to thank Kyle Mooney for making available his code supporting migration of dynamically refined meshes, as well as acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources

    Ein Kommunikationsschema für parallele CG-Verfahren zur Lösung von Gleichungssystemen mit dünnbesetzter Koeffizientenmatrix aus FE-Anwendungen

    No full text
    Bei der Diskretisierung gewöhnlicher oder partieller Differentialgleichungen ergeben sich je nach Diskretisierungsverfahren Gleichungssysteme mit dünnbesetzten Koeffizientenmatrizen unterschiedlicher Besetzungsstruktur; bei FE-Verfahren sind die entstehenden Gleichungssysteme weitgehend unstrukturiert. Als iterative Löser für Gleichungssysteme werden meist CG-Verfahren mit verschiedenen Vorkonditionierungen verwendet. Die Hauptarbeit in jeder Iteration des CG-Verfahrens besteht in der Berechnung des Matrix-Vektor-Produkts. Auf Parallelrechnern mit verteiltem Speicher ist in Abhängigkeit von den verwendeten Datenstrukturen insbesondere das Datenverteilungs- und Kommunikationsmodell für die effiziente Berechnung dieser Operation entscheidend. In den hier vorgestellten Untersuchungen wird ein Kommunikationsschema vorgeschlagen, das auf der Analyse der Spaltenindizes der Nicht-Null-Elemente der Matrix beruht. Das Zeitverhalten des entwickelten parallelen CG-Verfahrens wurde auf dem Parallelrechner mit verteiltem Speicher INTEL iPSC/860 des Forschungszentrums Jülich mit Gleichungssystemen aus FE-Modellen untersucht

    Ein Kommunikationsschema für parallele CG-Verfahren zur Lösung von Gleichungssystemen mit dünnbesetzter Koeffizientenmatrix aus FE-Anwendungen

    No full text

    Conjugate Gradient and Lanczos Methods for Sparse Matrices on Distributed Memory Multiprocessors

    No full text
    Conjugate gradient methods for solving sparse systems of linear equations and Lanczos algorithms for sparse symmetric eigenvalue problems play an important role in numerical methods for solving discretized partial differential equations. When these iterative solvers are parallelized on a multiprocessor system with distributed memory, the data distribution and the communication scheme—depending on the data structures used for the sparse coefficient matrices—are crucial for efficient execution. Here, data distribution and communication schemes are presented that are based on the analysis of the indices of the nonzero matrix elements. On an Intel PARAGON XP/S 10 with 140 processors, the developed parallel variants of the solvers show good scaling behavior for matrices with different sparsity patterns stemming from real finite element applications

    Data distribution and communication schemes for solving sparse systems of linear equations from FE applications by parallel CG methods

    Get PDF
    For the solution of discretized ordinary or partial differential equations it is necessary to solve systems of equations with coefficient matrices of different sparsity pattern, depending on the discretization method; using the finite element (FE) method results in largely unstructured systems of equations. Iterative solvers for equation systems mainly consist of matrix-vector products and vector-vector operations. A frequently used iterative solver is the method of conjugate gradients (CG) with different preconditioners. For parallelizing this method on a multiprocessor system with distributed memory, in particular the data distribution and the communication scheme depending on the used data structure for sparse matrices are of greatest importance for the efficient execution. These schemes can be determined before the execution of the solver by preprocessing the symbolic structure of the sparse matrix and can be exploited in each iteration. In this report, data distribution and communication schemes are presented which are based on the analysis of the column indices of the non-zero matrix elements. Performance tests of the developed parallel CG algorithms have been carried out on the distributed memory system INTEL iPSC/860 of the Research Centre JĂĽlich with sparse matrices from FE models. These methods have performed well for matrices of very different sparsity pattern

    A parallel algorithm for determining all eigenvalues of large real symmetric tridiagonal matrices

    No full text
    A method for determining all eigenvalues of large real symmetric tridiagonal matrices on multiprocessor system with vector facilities is presented. For finding the eigenvalues of a tridiagonal matrix, the method of the Sturm sequence is a standard method. The method uses bisection first to isolate all eigenvalues, bisection is and then to extract the eigenvalues to a predefined accuracy. For extracting the eigenvalues, bisection is accelerated by a superlinearly convergent zero finder, the Pegasus method. The evaluation of the Sturm sequence is the central component for both isolation and extraction. Some new ideas are presented, such as a method for weighting the values of the characteristics polynomial to avoid under- or overflow, a method for combining the Pegasus method with preceding bisection steps and a vectorization and parallelization strategy over intervals. The method was implemented and the results were measured on a SUPRENUM multiprocessor system with 16 processors and on a CRAY Y-MP8/832 with 8 processors. On the latter machine, both the sequential and parallel execution time of our algorithm ALLEV (ALL Eigen Values) presented in this paper are considerably shorter than the execution times of the vectorized EISPACK-routine TQL1 which uses the QL method

    Conjugate Gradient and Lanczos Methods for Sparse Matrices on Distributed Memory Multiprocessors

    No full text
    Conjugate gradient methods to solve sparse systems of linear equations and Lanczos algorithms for sparse symmetric eigenvalue problems play an important role in numerical methods for solving discretized partial differential equations. When these iterative solvers are parallelized on a multiprocessor system with distributed memory, the data distribution and the communication scheme --- depending on the data structures used for the sparse coefficient matrices --- are crucial for an efficient execution. Here, data distribution and communication schemes are presented that are based on the analysis of the indices of the non-zero matrix elements. On an Intel PARAGON XP/S 10 with 140 processors, the developed parallel variants of the solvers show good scaling behavior for matrices with different sparsity patterns stemming from real finite element applications. List of Symbols (CG), (compressed ...) e.g. italic A e.g. matrices: uppercase, italic, boldfaced IR the reals , ¸, ß Greek: "lamb..

    Bio-numerical simualtions with SimBio

    Get PDF
    The central objective of the SimBio§ project is the improvement of clinical and medical practices by the use of large-scale numerical simulation for bio-medical problems. SimBio provides a generic simulation environment running on parallel and distributed computing systems. An innovative key feature is the input of patient specific data to the modelling and simulation process. While future SimBio users will be able to develop application specific tools to improve practices in many areas, the project evaluation & validation will demonstrate improvements in: non-invasive diagnosis and pre-operative planning and the design of prostheses. The SimBio environment consists of components for the discrete representation of the physical problem, the numerical solution system, inverse problem solving, optimization and visualization. The core of the environment is the numerical solution system comprising parallel Finite Element solvers and advanced numerical library routines. The compute-intensive components are implemented on high performance comput- ing (HPC) platforms. The following article explains the HPC requirements of the bio-medical project applica- tions and presents the SimBio solutions for the project validation examples: electromagnetic source localization within the human brain, bio-mechanical simulations of the human head and the design of knee joint menisci replacements. Results include performance measurements of the parallel solvers in the SimBio environment. The paper concludes with an outlook on future Grid-computing activities based on SimBio developments
    corecore