9 research outputs found

    Optimal configuration of FETI solvers in HPC

    Get PDF
    Hlavním tématem této diplomové práce byla predikce časové náročnosti metod Total FETI a Hybrid Total FETI implementovaných v knihovně ESPRESO. Druhým tématem byl průzkum energetické náročnosti kninhovny ESPRESO pro plánovaný model spotřeby. Model časové náročnosti byl vytvořen pomocí zobecněné lineární regrese s použitím jazyku R pro implementaci softwaru potřebného ke zpracování naměřených dat. Energetická náročnost byla zkoumána pomocí nástrojů MERIC a RADAR implementovaných v rámci projektu READEX, který patří do programu Horizon2020. Model, který popisuje časovou náročnost je užitečný pro odhadování optimálních nastavení i když pro větší hodnoty není predikce v současné verzi příliš přesná. MERIC a RADAR byly použity pro evaluaci energetické úspory pro několik hardwarových a aplikačních parametrů. Model, který popisuje časovou náročnost bude implementován do knihovny ESPRESO, aby bylo možné automaticky odhadovat optimální nastavení pro minimální čas výpočtu bez předchozích testů. Podobně, model spotřeby bude sestaven za pomoci výsledků získaných pomocí nástrojů MERIC a RADAR a později implementován do knihovny ESPRESO také, čímž bude schopná odhadovat nejen optimální konfiguraci pro minimální čas výpočtu, ale také pro minimální spotřebu energie.The main objective of this thesis was the performance prediction of Total FETI and Hybrid Total FETI methods implemented in ESPRESO library. The secondary objective was the investigation of energy requirements of ESPRESO for a planned consumption model. The performance model was created by generalized linear regression, using R-language for implementation of the software needed to process measured data. The energy consumption was investigated using MERIC and RADAR tools implemented under READEX project in Horizon2020 programme. The model describing performance is useful for estimations of optimal settings, although the fit is not very precise for larger values. MERIC and RADAR were used to evaluate energy savings for multiple hardware and application parameters. The performance model will be implemented into the ESPRESO library, so it will be able to estimate optimal settings for minimal run-time without demanding prior tests. Similarly, the consumption model will be assembled using results obtained by MERIC and RADAR and later implemented into the ESPRESO too, making it able to estimate not only optimal settings for minimal run-time, but for the minimal energy consumption too.470 - Katedra aplikované matematikyvýborn

    Scalability Improvement of the Projected Conjugate Gradient Method used in FETI Domain Decomposition Algorithms

    Get PDF
    This report summarizes the results of the scalability improvements of the algorithms used in Total FETI (TFETI). A performance evaluation of two new techniques is presented in this report: (1) a novel pipelined implementation of CG method in PETSc and (2) a MAGMA LU solver running on following many-cores accelerators: GPU Nvidia Tesla K20m and Intel MIC Xeon Phi 5110P

    MERIC and RADAR generator: tools for energy evaluation and runtime tuning of HPC applications

    Get PDF
    This paper introduces two tools for manual energy evaluation and runtime tuning developed at IT4Innovations in the READEX project. The MERIC library can be used for manual instrumentation and analysis of any application from the energy and time consumption point of view. Besides tracing, MERIC can also change environment and hardware parameters during the application runtime, which leads to energy savings. MERIC stores large amounts of data, which are difficult to read by a human. The RADAR generator analyses the MERIC output files to find the best settings of evaluated parameters for each instrumented region. It generates a Open image in new window report and a MERIC configuration file for application production runs

    Domain Knowledge Specification for Energy Tuning

    Get PDF
    The European Horizon 2020 project READEX is developing a tool suite for dynamic energy tuning of HPC applications. While the tool suite supports an automatic approach, domain knowledge can significantly help in the analysis and the runtime tuning phase. This paper presents the means available in READEX for the application expert to provide his expert knowledge to the tool suite

    Parallelizations of TFETI-1 coarse problem

    Get PDF
    Import 03/11/2016Metody založené na FETI, používané pro řešení eliptických parcialních diferencialních rovnic, představují velmi úspěšnou třídu metod dekompozice oblasti, které se používají pro paralelizaci dobře známých metod konečných prvků. Původní problém ve FETI methodách je rozdělen na menší problémy definované na podoblastech. Díky tomu, že se podoblasti nepřekrývají, můžeme menší problémy nezávisle na sobě řešit paralelně. Počet podoblastí cheme zvyšovat tak, aby se menší problémy řešily rychleji. To ale zároveň vede k růstu velikosti hrubého problému. Pro složité problémy je navíc potřeba řešit hrubý problém mnohokrát. Díky tomu je potřeba najít řešení hrubého problému co nejefektivněji. Tato práce se zabývá paralelnímy strategiemi řešení hrubého problému TFETI--1 metody.The FETI based methods, used for the solution of elliptical partial differential equations, form a highly successful class of domain decomposition methods used for parallelization of well known finite element methods. In the FETI methods we partition the original problem into smaller problems defined on subdomains. Since the subdomains are non-overlapping we can naturally solve the smaller problems independently in parallel. We want to increase the number of subdomains so that the smaller problems are solved faster. This however leads to the increase in the size of the coarse problem. Moreover, for complex problems, the number of coarse problem solutions needed can be very high. Therefore, it is important to find the solution of the coarse problem efficiently. This thesis deals with parallelization strategies of the TFETI--1 coarse problem.470 - Katedra aplikované matematikyvýborn

    Parallel harmonic balance method for analysis of nonlinear mechanical systems

    Get PDF
    Mechanical vibration analysis and modelling are essential tools used in the design of various mechanical components and structures. In the case of turbine engine design specifically, the ability to accurately predict vibration of various parts is crucial to ensure their safe operation while maintaining efficiency. As the designs become increasingly complex and margins for errors get smaller, high fidelity numerical vibration models are necessary for their analysis. Research of parallel algorithms has progressed significantly in the last decades, thanks to the exponential growth of the world's available computational resources. This work explores the possibilities for parallel implementations for solving large scale nonlinear vibration problems. A C++ code using MPI was developed to validate these implementations in practice. The harmonic balance method is used in combination with finite elements discretisation and applied to an elastic body with the Green-Lagrange nonlinear model for large deformations. A parameter continuation scheme using a predictor-corrector approach is included to compute frequency response functions. A Newton-Raphson solver is used to solve the bordered nonlinear system of equations in the frequency domain. Three different parallel algorithms for solving the linearised problem in each Newton iteration are analysed - a sparse direct solver (using MUMPS library), GMRES (using PETSc library) and an inhouse implementation of FETI. The performance of the solvers is analysed using beam testcases and a fan blade geometry. Scalability of MUMPS and the FETI solver is assessed. Full nonlinear frequency response functions with turning points are also computed. Use of artificial coarse space and preconditioning in FETI is discussed as it greatly impacts convergence properties of the solver. The presented parallel linear solvers show promising scalability results and an ability to solve nonlinear systems of several million degrees of freedom.Open Acces

    Toward highly parallel loading of unstructured meshes

    Get PDF
    This paper presents an algorithm for highly-parallel loading and processing of unstructured mesh databases in a dis tributed memory environment of large HPC clusters without collecting data into a single process. The algorithm is proved effective, having linear speedup in the large dataset limit. Demonstrated on Ansys CDB, EnSight, VTK Legacy, and XDMF databases, we show that it is possible to efficiently reconstruct meshes with 800 million nodes and 500 million elements in several seconds on thousands of processors, even from databases that were not designed to be read in parallel. The algorithm is implemented in our MESIO library that can be used as (i) an efficient parallel loader (e.g. for numerical physical solvers) or as (ii) a high performing parallel converter between mesh databases.Web of Science166art. no. 10310

    Adaptive FETI-DP and BDDC methods for highly heterogeneous elliptic finite element problems in three dimensions

    Get PDF
    Numerical methods are often well-suited for the solution of (elliptic) partial differential equations (PDEs) modeling naturally occuring processes. Many different solvers can be applied to systems which are obtained after discretization by the finite element method. Parallel architectures in modern computers facilitate the efficient use of diverse divide and conquer strategies. The intuitive approach, to divide a large (global) problem into subproblems, which are then solved in parallel, can significantly reduce the solution time. It is obvious that the solvers on the local subproblems then should deliver the contributions of the global solution restricted to the subdomains of computational region. The class of domain decomposition methods provides widely-used iterative algorithms for the parallel solution of implicit finite element problems. Often, an additional coarse space, which introduces a coupling between the subdomains, is used to ensure a global transport of information between the subdomains across the entire domain. The FETI-DP and BDDC domain decomposition methods are highly scalable parallel algorithms. However, when the parameter or coefficient distribution in the underlying partial differential equation becomes highly heterogeneous, classical methods, with a priori chosen coarse spaces, might not converge in a limited number of iterations. A remedy is offered by problem-dependent coarse spaces. These coarse spaces can be provided by adaptive methods, which then can improve the convergence at the cost of additional constraints. In this thesis, we introduce robust FETI-DP and BDDC methods for three-dimensional problems. These methods incorporate constraints, which are computed from local eigenvalue problems on faces and edges between subdomains, into the coarse space. The implementation of the constraints is performed by a deflation or balancing approach or by partial finite element assembly after a transformation of basis. For the latter, we introduce the generalized transformation-of-basis approach and show its correspondence to a deflation or balancing approach. An efficient parallel implementation of adaptive FETI-DP is discussed in the last part of this thesis. We provide weak and strong parallel scalability results for our adaptive algorithm executed on the supercomputer magnitUDE of the University of Duisburg-Essen. For weak scaling, we can show very good results up to 4,096 cores. We can also present very good strong scaling results up to 864 cores

    A massively parallel and memory-efficient FEM toolbox with a hybrid total FETI solver with accelerator support

    No full text
    In this article, we present the ExaScale PaRallel finite element tearing and interconnecting SOlver (ESPRESO) finite element method (FEM) library, which includes an FEM toolbox with interfaces to professional and open-source simulation tools, and a massively parallel hybrid total finite element tearing and interconnecting (HTFETI) solver which can fully utilize the Oak Ridge Leadership Computing Facility Titan supercomputer and achieve superlinear scaling. This article presents several new techniques for finite element tearing and interconnecting (FETI) solvers designed for efficient utilization of supercomputers with a focus on (i) performance—we present a fivefold reduction of solver runtime for the Laplace equation by redesigning the FETI solver and offloading the key workload to the accelerator. We compare Intel Xeon Phi 7120p and Tesla K80 and P100 accelerators to Intel Xeon E5-2680v3 and Xeon Phi 7210 central processing units; and (ii) memory efficiency—we present two techniques which increase the efficiency of the HTFETI solver 1.8 times and push the limits of the largest possible problem ESPRESO that can solve from 124 to 223 billion unknowns for problems with unstructured meshes. Finally, we show that by dynamically tuning hardware parameters, we can reduce energy consumption by up to 33%
    corecore