410 research outputs found
Developing numerical libraries in Java
The rapid and widespread adoption of Java has created a demand for reliable
and reusable mathematical software components to support the growing number of
compute-intensive applications now under development, particularly in science
and engineering. In this paper we address practical issues of the Java language
and environment which have an effect on numerical library design and
development. Benchmarks which illustrate the current levels of performance of
key numerical kernels on a variety of Java platforms are presented. Finally, a
strategy for the development of a fundamental numerical toolkit for Java is
proposed and its current status is described.Comment: 11 pages. Revised version of paper presented to the 1998 ACM
Conference on Java for High Performance Network Computing. To appear in
Concurrency: Practice and Experienc
Summary of the functions and capabilities of the structural analysis system computer program
Functions and operations of structural analysis system computer progra
Algorithm Architecture Co-design for Dense and Sparse Matrix Computations
abstract: With the end of Dennard scaling and Moore's law, architects have moved towards
heterogeneous designs consisting of specialized cores to achieve higher performance
and energy efficiency for a target application domain. Applications of linear algebra
are ubiquitous in the field of scientific computing, machine learning, statistics,
etc. with matrix computations being fundamental to these linear algebra based solutions.
Design of multiple dense (or sparse) matrix computation routines on the
same platform is quite challenging. Added to the complexity is the fact that dense
and sparse matrix computations have large differences in their storage and access
patterns and are difficult to optimize on the same architecture. This thesis addresses
this challenge and introduces a reconfigurable accelerator that supports both dense
and sparse matrix computations efficiently.
The reconfigurable architecture has been optimized to execute the following linear
algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM
(Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver),
LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication),
SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where
each core consists of a 2D array of processing elements (PE).
The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix
updates efficiently. A sequence of such updates is used to solve a larger problem inside
a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR)
is used to perform sparse kernel updates. Scalable partitioning and mapping schemes
are presented that map input matrices of any given size to the multicore architecture.
Design trade-offs related to the PE array dimension, size of local memory inside a core
and the bandwidth between on-chip memories and the cores have been presented. An
optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto
32 GOPS using a single core.Dissertation/ThesisMasters Thesis Computer Engineering 201
Performance Improvements of Common Sparse Numerical Linear Algebra Computations
Manufacturers of computer hardware are able to continuously sustain an unprecedented pace of progress in computing speed of their products, partially due to increased clock rates but also because of ever more complicated chip designs. With new processor families appearing every few years, it is increasingly harder to achieve high performance rates in sparse matrix computations. This research proposes new methods for sparse matrix factorizations and applies in an iterative code generalizations of known concepts from related disciplines. The proposed solutions and extensions are implemented in ways that tend to deliver efficiency while retaining ease of use of existing solutions. The implementations are thoroughly timed and analyzed using a commonly accepted set of test matrices. The tests were conducted on modern processors that seem to have gained an appreciable level of popularity and are fairly representative for a wider range of processor types that are available on the market now or in the near future. The new factorization technique formally introduced in the early chapters is later on proven to be quite competitive with state of the art software currently available. Although not totally superior in all cases (as probably no single approach could possibly be), the new factorization algorithm exhibits a few promising features. In addition, an all-embracing optimization effort is applied to an iterative algorithm that stands out for its robustness. This also gives satisfactory results on the tested computing platforms in terms of performance improvement. The same set of test matrices is used to enable an easy comparison between both investigated techniques, even though they are customarily treated separately in the literature. Possible extensions of the presented work are discussed. They range from easily conceivable merging with existing solutions to rather more evolved schemes dependent on hard to predict progress in theoretical and algorithmic research
The Linear Algebra Mapping Problem
We observe a disconnect between the developers and the end users of linear
algebra libraries. On the one hand, the numerical linear algebra and the
high-performance communities invest significant effort in the development and
optimization of highly sophisticated numerical kernels and libraries, aiming at
the maximum exploitation of both the properties of the input matrices, and the
architectural features of the target computing platform. On the other hand, end
users are progressively less likely to go through the error-prone and time
consuming process of directly using said libraries by writing their code in C
or Fortran; instead, languages and libraries such as Matlab, Julia, Eigen and
Armadillo, which offer a higher level of abstraction, are becoming more and
more popular. Users are given the opportunity to code matrix computations with
a syntax that closely resembles the mathematical description; it is then a
compiler or an interpreter that internally maps the input program to lower
level kernels, as provided by libraries such as BLAS and LAPACK. Unfortunately,
our experience suggests that in terms of performance, this translation is
typically vastly suboptimal.
In this paper, we first introduce the Linear Algebra Mapping Problem, and
then investigate how effectively a benchmark of test problems is solved by
popular high-level programming languages. Specifically, we consider Matlab,
Octave, Julia, R, Armadillo (C++), Eigen (C++), and NumPy (Python); the
benchmark is meant to test both standard compiler optimizations such as common
subexpression elimination and loop-invariant code motion, as well as linear
algebra specific optimizations such as optimal parenthesization of a matrix
product and kernel selection for matrices with properties. The aim of this
study is to give concrete guidelines for the development of languages and
libraries that support linear algebra computations
- …