Search CORE

1,064 research outputs found

Basic linear algebra subprograms for FORTRAN usage

Author: Hanson R. J.
Kincaid D. R.
Krogh F. T.
Lawson C. L.
Publication venue
Publication date
Field of study

A package of 38 low level subprograms for many of the basic operations of numerical linear algebra is presented. The package is intended to be used with FORTRAN. The operations in the package are dot products, elementary vector operations, Givens transformations, vector copy and swap, vector norms, vector scaling, and the indices of components of largest magnitude. The subprograms and a test driver are available in portable FORTRAN. Versions of the subprograms are also provided in assembly language for the IBM 360/67, the CDC 6600 and CDC 7600, and the Univac 1108

NASA Technical Reports Server

Real-Time, Dynamic Hardware Accelerators for BLAS Computation

Author: Raymond J. Weber, Brock J. LaMeres, Justin A. Hogan
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/01/2017
Field of study

This paper presents an approach to increasing the capability of scientific computing through the use of real-time, partially reconfigurable hardware accelerators that implement basic linear algebra subprograms (BLAS). The use of reconfigurable hardware accelerators for computing linear algebra functions has the potential to increase floating point computation while at the same time providing an architecture that minimizes data movement latency and increase power efficiency. While there has been significant work by the computing community to optimize BLAS routines at the software level, optimizing these routines in hardware using reconfigurable fabrics is in its infancy. This paper begins with a comprehensive overview of the history and evolution of BLAS for use in scientific computing. In the reviews current successes in using reconfigurable computing architectures achieve acceleration. It then presents an investigation of an accelerator approach with a granularity at the logic circuit level through real-time, partial reconfiguration of a programmable fabric with static accelerator cache memory to minimize data movement. Empirical data is presented for a study on a single-FPGA

International Journal on Recent and Innovation Trends in Computing and Communication

PB-BLAS: a set of parallel block basic linear algebra subprograms

Author: David W. Walker
Jack J. Dongarra
Jaeyoung Choi
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

Crossref

Translation of algorithm 539: basic linear algebra subprograms for FORTRAN usage in FORTRAN 200 for the Cyber 205

Author: Nool M. (Margreet)
Publication venue: CWI
Publication date: 01/01/1987
Field of study

CWI's Institutional Repository

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Author: Camille Coti
Camille Coti
Camille Coti
Emmanuel Agullo
Emmanuel Agullo
Emmanuel Agullo
Jack Dongarra
Jack Dongarra
Jack Dongarra
Julien Langou
Julien Langou
Qr Fac
Thomas Herault
Thomas Herault
Thomas Herault
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2009
Field of study

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

Portabilität und Adaption von Software der linearen Algebra für Distributed Memory Systeme

Author: Hebermehl Georg
Hübner Friedrich-Karl
Publication venue
Publication date: 01/01/1996
Field of study

Durch die Verwendung anerkannter Grundbausteine für elementare Operationen der linearen Algebra und von Kommunikationsroutinen sowie üblicher blockzyklischer Datenverteilungen können Algorithmen höheren Levels weitgehend portabel und optimal auf Distributed Memory Computern adaptiert werden. Insbesondere wird über die Bereitstellung der Kommunikationsbibliothek BLACS für PARSYTEC-Rechner berichtet

Crossref

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

An Implementation of Bayesian Adaptive Regression Splines (BARS) in C with S and R Wrappers

Author: Garrick Wallstrom
Jeffrey Liebner
Robert E. Kass
Publication venue
Publication date
Field of study

BARS (DiMatteo, Genovese, and Kass 2001) uses the powerful reversible-jump MCMC engine to perform spline-based generalized nonparametric regression. It has been shown to work well in terms of having small mean-squared error in many examples (smaller than known competitors), as well as producing visually-appealing fits that are smooth (filtering out high-frequency noise) while adapting to sudden changes (retaining high-frequency signal). However, BARS is computationally intensive. The original implementation in S was too slow to be practical in certain situations, and was found to handle some data sets incorrectly. We have implemented BARS in C for the normal and Poisson cases, the latter being important in neurophysiological and other point-process applications. The C implementation includes all needed subroutines for fitting Poisson regression, manipulating B-splines (using code created by Bates and Venables), and finding starting values for Poisson regression (using code for density estimation created by Kooperberg). The code utilizes only freely-available external libraries (LAPACK and BLAS) and is otherwise self-contained. We have also provided wrappers so that BARS can be used easily within S or R.

Research Papers in Economics

Developing numerical libraries in Java

Author: Boisvert Ronald F.
Dongarra Jack J.
Pozo Roldan
Remington Karin
Stewart G. W.
Publication venue
Publication date: 01/01/1998
Field of study

The rapid and widespread adoption of Java has created a demand for reliable and reusable mathematical software components to support the growing number of compute-intensive applications now under development, particularly in science and engineering. In this paper we address practical issues of the Java language and environment which have an effect on numerical library design and development. Benchmarks which illustrate the current levels of performance of key numerical kernels on a variety of Java platforms are presented. Finally, a strategy for the development of a fundamental numerical toolkit for Java is proposed and its current status is described.Comment: 11 pages. Revised version of paper presented to the 1998 ACM Conference on Java for High Performance Network Computing. To appear in Concurrency: Practice and Experienc

arXiv.org e-Print Archive

CiteSeerX

The University of Manchester - Institutional Repository