Search CORE

42 research outputs found

Empirically Tuning LAPACK’s Blocking Factor for Increased Performance

Author: R. Clint Whaley
Publication venue
Publication date: 01/01/2008
Field of study

Abstract—LAPACK (Linear Algebra PACKage) is a statically cache-blocked library, where the blocking factor (NB) is determined by the service routine ILAENV. Users are encouraged to tune NB to maximize performance on their platform/BLAS (the BLAS are LAPACK’s computational engine), but in practice very few users do so (both because it is hard, and because its importance is not widely understood). In this paper we (1) Discuss our empirical tuning framework for discovering goodNB settings, (2) quantify the performance boost that tuning NB can achieve on several LAPACK routines across multiple architectures and BLAS implementations, (3) compare the best performance of LAPACK’s statically blocked routines against state of the art recursively blocked routines, and vendor-optimized LAPACK implementations, to see how much performance loss is mandated by LAPACK’s present static blocking strategy, and finally (4) use results to determine how best to block nonsquare matrices once good square blocking factors are discovered

CiteSeerX

Installing and Testing the BLACSv1.1

Author: R. Clint Whaley
Publication venue
Publication date
Field of study

This report covers the installation and testing of the BLACS [3]. The sections on BLACS installation will usually apply only to the BLACS obtained from netlib. The BLACS tester, however, should be run on any version of the BLACS in order to verify that they are working correctly. There are now several vendors supporting BLACS implementations on their machines. With the BLACS being produced by many different groups, it becomes more important than ever to ensure that all versions are both syntactically and semantically correct. The BLACS tester has been written to perform at least some of these checks. This tester calls every standard BLACS routine. Thus a successful link ensures that all standard routines at least exist in the BLACS implementation being tested. The point to point, broadcast, and combine routines may be tested as extensively as the user desires using input files. The remaining routines are lumped into the "auxiliary" tests. More information on these various tests are given in the relevant sections. The outline for installing and testing the BLACS is given below. The following sections expand on this outline. 1. Download the BLACS, their tester, and the related papers (see Sections 2.2-2.3 for details). 2. Select a Bmake.inc example from the BLACS/BMAKES directory to serve as your starting point for a Bmake.inc, and copy it to BLACS/Bmake.inc. For example, if you are compiling the PVMBLACS on an alpha machine, from the BLACS/ directory you would type cp BMAKES/Bmake.PVM-ALPHA Bmake.inc . (see Sections 2.1 and 2.4 for details). 3. Edit this file to fit your system (see Section 2.4 for details). 4. Compile the BLACS (see Section 2.6 for details). 5. Compile the BLACS tester (see Section 2.7 for details). 6. Test the BLACS (see Section 3 for details)

CiteSeerX

Using BLACS and MPI in ScaLAPACK

Author: R. Clint Whaley
Publication venue
Publication date
Field of study

The definition and implementation of the MPI standard has naturally led to the idea of replacing ScaLAPACK's message passing layer, the BLACS, with direct calls to MPI. In this paper we discuss why we feel this step is unnecessary, and indeed perhaps counter-productive. DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT Dept. of Computer Sciences, Univ. of TN, Knoxville, TN 37996, [email protected] DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT 2 1 Introduction This paper assumes a working knowledge of the BLACS [3, 6, 7, 8] and MPI [4]. The definition and implementation of the MPI standard has naturally led to the idea of replacing ScaLAPACK's message passing layer, the BLACS, with direct calls to MPI. The reasons given for such a move are usually a variation of one of the following: 1. Well-known MPI calls would yield greater readability 2. Would eliminate need for BLACS and their resulting code 3. Wide range of calls MPI offers would provide much greater functionality than present using onl..

CiteSeerX