1,064 research outputs found

    Basic linear algebra subprograms for FORTRAN usage

    Get PDF
    A package of 38 low level subprograms for many of the basic operations of numerical linear algebra is presented. The package is intended to be used with FORTRAN. The operations in the package are dot products, elementary vector operations, Givens transformations, vector copy and swap, vector norms, vector scaling, and the indices of components of largest magnitude. The subprograms and a test driver are available in portable FORTRAN. Versions of the subprograms are also provided in assembly language for the IBM 360/67, the CDC 6600 and CDC 7600, and the Univac 1108

    Real-Time, Dynamic Hardware Accelerators for BLAS Computation

    Get PDF
    This paper presents an approach to increasing the capability of scientific computing through the use of real-time, partially reconfigurable hardware accelerators that implement basic linear algebra subprograms (BLAS). The use of reconfigurable hardware accelerators for computing linear algebra functions has the potential to increase floating point computation while at the same time providing an architecture that minimizes data movement latency and increase power efficiency. While there has been significant work by the computing community to optimize BLAS routines at the software level, optimizing these routines in hardware using reconfigurable fabrics is in its infancy. This paper begins with a comprehensive overview of the history and evolution of BLAS for use in scientific computing. In the reviews current successes in using reconfigurable computing architectures achieve acceleration. It then presents an investigation of an accelerator approach with a granularity at the logic circuit level through real-time, partial reconfiguration of a programmable fabric with static accelerator cache memory to minimize data movement. Empirical data is presented for a study on a single-FPGA

    QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

    Get PDF
    Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

    Portabilität und Adaption von Software der linearen Algebra für Distributed Memory Systeme

    Get PDF
    Durch die Verwendung anerkannter Grundbausteine für elementare Operationen der linearen Algebra und von Kommunikationsroutinen sowie üblicher blockzyklischer Datenverteilungen können Algorithmen höheren Levels weitgehend portabel und optimal auf Distributed Memory Computern adaptiert werden. Insbesondere wird über die Bereitstellung der Kommunikationsbibliothek BLACS für PARSYTEC-Rechner berichtet

    An Implementation of Bayesian Adaptive Regression Splines (BARS) in C with S and R Wrappers

    Get PDF
    BARS (DiMatteo, Genovese, and Kass 2001) uses the powerful reversible-jump MCMC engine to perform spline-based generalized nonparametric regression. It has been shown to work well in terms of having small mean-squared error in many examples (smaller than known competitors), as well as producing visually-appealing fits that are smooth (filtering out high-frequency noise) while adapting to sudden changes (retaining high-frequency signal). However, BARS is computationally intensive. The original implementation in S was too slow to be practical in certain situations, and was found to handle some data sets incorrectly. We have implemented BARS in C for the normal and Poisson cases, the latter being important in neurophysiological and other point-process applications. The C implementation includes all needed subroutines for fitting Poisson regression, manipulating B-splines (using code created by Bates and Venables), and finding starting values for Poisson regression (using code for density estimation created by Kooperberg). The code utilizes only freely-available external libraries (LAPACK and BLAS) and is otherwise self-contained. We have also provided wrappers so that BARS can be used easily within S or R.

    Developing numerical libraries in Java

    Full text link
    The rapid and widespread adoption of Java has created a demand for reliable and reusable mathematical software components to support the growing number of compute-intensive applications now under development, particularly in science and engineering. In this paper we address practical issues of the Java language and environment which have an effect on numerical library design and development. Benchmarks which illustrate the current levels of performance of key numerical kernels on a variety of Java platforms are presented. Finally, a strategy for the development of a fundamental numerical toolkit for Java is proposed and its current status is described.Comment: 11 pages. Revised version of paper presented to the 1998 ACM Conference on Java for High Performance Network Computing. To appear in Concurrency: Practice and Experienc
    corecore