21 research outputs found

    The hybrid parallelization of TFETI-1 method

    Get PDF
    Import 22/07/2015Práce se zabývá hybridní paralelizací TFETI-1 metody implementované v knihovně PermonFLLOP. Nejdříve představuje různé paralelní programovací modely a způsoby hybridní paralelizace. Poté uvádí možné výhody hybridní paralelizace v porovnání s čistou MPI paralelizací. Dále pak tato práce analyzuje balíčky poskytující přímé řešiče, které hrají klíčovou roli v implementaci TFETI-1, z hlediska vhodnosti pro hybridní paralelizaci. Nová implementace TFETI-1 metody rozšiřuje stávající implementaci paralelizovanou čistě pomocí MPI. Díky ní je možné držet data více podoblastí na jednom MPI procesu. To umožnuje využít dobrých vlastností numerické škálovatelnosti FETI metod. V numerických experimentech je pak testována numerická škálovatelnost a vliv hybridní paralelizace.The thesis deals with the hybrid parallelization of the TFETI-1 method which is implemented in the library PermonFLLOP. At first, the thesis presents various parallel programming models and ways of the hybrid parallelization. There are listed the possible benefits of the hybrid parallelization compared with the pure MPI parallelization. Then the thesis analyzes the packages providing direct solvers, which have important role in the TFETI-1 implementation, in terms of suitability for hybrid parallelization. New implementation extends the existing pure MPI implementation. With this extension, data of more subdomains can be now stored per one MPI process. This allows the use of good properties of the numerical scalability of the FETI methods. In numerical experiments, the numerical scalability and the impact of hybrid parallelization are tested.470 - Katedra aplikované matematikyvýborn

    The energy consumption optimization of the BLAS routines

    Get PDF
    The paper deals with the energy consumption evaluation of selected Sparse and Dense BLAS Level 1, 2 and 3 routines. Authors employed AXPY, Sparse Matrix-Vector, Sparse Matrix-Matrix, Dense Matrix-Vector, Dense Matrix-Matrix and Sparse Matrix-Dense Matrix multiplication routines from Intel Math Kernel Library (MKL). The measured characteristics illustrate the different energy consumption of BLAS routines, as some operations are memory-bounded and others are compute-bounded. Based on their recommendations one can explore dynamic frequency switching to achieve significant energy savings up to 23%

    Глобализация, региональная интеграция и экономическое развитие

    Get PDF
    The paper deals with the energy consumption evaluation of the Finite Element Tearing and Interconnect (FETI) based solvers of linear systems, which is an established method for solving real-world engineering problems. Authors evaluated the effect of the CPU frequency on the energy consumption of the FETI solver using a linear elasticity 3D cube synthetic benchmark. In this problem, the effect of frequency tuning on the energy consumption of the essential processing kernels of the FETI method was evaluated. The paper provides results for two types of frequency tuning: (1) static tuning and (2) dynamic tuning. For static tuning experiments, the frequency is set before execution and kept constant during the runtime. For dynamic tuning, the frequency is changed during the program execution to adapt the system to the actual needs of the application. The paper shows that static tuning brings up 12% energy savings when compared to default CPU settings (the highest clock rate). The dynamic tuning improves this further by up to 3%

    Parallel implementation of matrix orthogonalization

    Get PDF
    Import 26/06/2013Tato bakalářská práce se zabývá paralelní implementací ortogonalizace matice. Ortogonalita je využívána v mnohých aplikacích a může pomoci k usnadnění výpočtu složitých inženýrských úloh. Práce nejprve popisuje některé způsoby ortogonalizace matice a následně se pak zabývá paralelizací různých verzí algoritmů Gramova-Schmidtova ortogonalizačního procesu. Je zde popsán způsob implementace pomocí knihovny PETSc, využívané pro paralelizaci vědeckých výpočtů. Na numerických experimentech je pak srovnána paralelní škálovatelnost a stabilita jednotlivých algoritmů.This bachelor thesis deals with parallel implementation of matrix orthogonalization. The orthogonality is used in many applications and can help in engineering calculations. This work describes some orthogonalization methods first, then it focuses on parallelization of the variants of the Gram-Schmidt orthogonalization process. The implementation using PETSc library is described here. PETSc is the tool for the parallelization of the scientific computations. The paralel scalability and the stability of the algorithms are compared on numerical experiments.470 - Katedra aplikované matematikyvýborn

    PDF Enhancements Tools for a Digital Library

    Get PDF
    summary:This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size of the original bitonal PDFs to be, on average, downsized by one third. Some modifications to the jbig2enc encoder that increase the compression ratio even further are also described here. Together with another program, the pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size to such an extent that the transmission needs of a digital library were significantly reduced. We report the storage saving results that we have achieved on The Czech Digital Mathematics Library DML-CZ—we have downsized the PDF corpus to 43% of its original size. We also describe pdfsign tool for batch digital signature stamping of PDF documents

    On the efficient reconstruction of displacements in FETI methods for contact problems

    No full text
    The final step in the solution of contact problems of elasticity by FETI-based domain decomposition methods is the reconstruction of displacements corresponding to the Lagrange multipliers for ''gluing'' of subdomains and non-penetration conditions. The rigid body component of the displacements is usually obtained by means of a well known but quite complex formula, the application of which requires reassembling and factorization of some large matrices. Here we propose a simple formula which is applicable to many variants of the FETI based algorithms for contact problems. The method takes a negligible time and avoids reassembling or factorization of any matrices

    The impact of enabling multiple subdomains per MPI process in the TFETI domain decomposition method

    No full text
    The paper deals with handling multiple subdomains per computational core in the PERMON toolbox, namely in the PermonFLLOP module, to fully exploit the potential of the Total Finite Element Tearing and Interconnecting (TFETI) domain decomposition method (DDM). Most authors researching FETI methods present weak parallel scalability with one subdomain assigned to each computational core, and call it just parallel scalability. Here we present an extension showing the data of more than one subdomain being held by each MPI process. Numerical experiments demonstrate the theoretically supported fact that for the given problem size and number of processors, the increased number of subdomains leads to better conditioning of the system operator, and hence faster convergence. Moreover, numerical, memory, strong parallel, and weak parallel scalability is reported, and optimal numbers of subdomains per core are examined. Finally, new PETSc matrix types dealing with the aforementioned extension are introduced.Web of Science31959758
    corecore