Search CORE

5 research outputs found

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems

Author: Abalenkovs Maksims
Abdelfattah Ahmad
Dongarra Jack
Gates M.
Haidar A
Kurzak Jakub
Luszczek Piotr
Tomov Stanimire
Yamazaki I.
YarKhan A.
Publication venue: 'FSAEIHE South Ural State University (National Research University)'
Publication date: 01/01/2015
Field of study

The University of Manchester - Institutional Repository

SANComSim: A scalable, adaptive and non-intrusive framework to optimize performance in computational science applications

Author: Filgueira Rosa
García Merayo Mercedes
Núñez Alberto
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Parallel processing has become the most common solution for developing and executing scientific computing applications. Actually, the best way to obtain good performance ratios is to exploit parallelism in both processing and communications. Although the study of computational performance has historically involved CPU power, currently the CPU is not the only concern in the overall performance. Due to the underlying design of parallel applications, communication networks play a very important role in the field of computational science. Despite the fact that networks used in multicore clusters are fast and have low latency, the amount of transferred data may cause a bottleneck in the communication system, as communication-intensive, parallel applications spend a significant amount of their total execution time exchanging data between processes. Moreover, in most cases, several users are executing different parallel applications at the same time in the cluster. In this paper we present SANComSim, a Scalable, Adaptive and Non-intrusive framework, based on simulation techniques, for optimizing the performance of the network system to execute complex applications. The main objective of this framework is to apply run-time compression, to reduce the data sent through the network, in order to increase the overall system performance. The main features of SANComSim are: adaptability, to dynamically adapt to the current state of the system; portability, the framework is neither focused on a specific programming language nor a platform; non-intrusive, since this framework is based on simulation techniques, which does not require exclusive access of the entire cluster system; scalability, any parallel application, independently of the number of processed and computing nodes, can use this framework to improve performance in cluster systems

Docta Complutense

Crossref

Elsevier - Publisher Connector

Heriot Watt Pure

University of St. Andrews - Pure

Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Multi-GPU Implementation of LU Factorization

Author: Dongarra Jack
Jia Yulu
Luszczek Piotr
Publication venue: Published by Elsevier B.V.
Publication date: 31/12/2012
Field of study

AbstractLU factorization is the most computationally intensive step in solving systems of linear equations. By obtaining first the LU factorization of the coefficient matrix, we then may readily solve the system using backward substitution. The computational cost of LU factorization in terms fioating point operations is cubic. There are various efforts to improve the performance of LU factorization. We propose a multi-core multi-GPU hybrid LU factorization algorithm that leverages the strengths of both multiple CPUs and multiple GPUs. Our algorithm uses some of the CPU cores for panel factorization, and the rest of the CPU cores together with all the available GPUs for trailing submatrix updates. Our algorithm employs both dynamic scheduling and static scheduling. Experiments show that our approach reaches 1134 Gflop/s with 4 Fermi GPU boards when combined with the total of 48 CPU cores from AMD. This is the first time such level of performance have been reported in a shared memory environment. Execution trace shows that our code also achieves good load balance and high system utilization

Elsevier - Publisher Connector

Multi-GPU Implementation of LU Factorization

Author: Dongarra
Dongarra
Dongarra
Gustafson
Strazdins
Valiant
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Crossref