Search CORE

4 research outputs found

Sparse matrix product implementation on field programmable gate arrays (EPGAS)

Author: Sheth Amit Mahendra
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2003
Field of study

If dense matrix multiplication algorithms are used with sparse matrices, they can result in a large number of redundant calculations, as numerous elements in sparse matrices are zero valued, thus available resources and time may be wasted. The algorithm discussed here aims to take advantage of the sparseness of the matrices by multiplying only nonzero elements. The NIOS development board from Altera is used for implementing the above algorithm. First a sequential program in the C programming language is downloaded onto the FPGA and run by the NIOS soft-processor. Then the same board is also used for a parallel implementation of the above algorithm using three NIOS soft-processors within the same FPGA. Such an approach is very critical because current FPGAs do not contain enough resources to solve large problems. For example, we cannot build large memory systems within FPGAs so we need to employ algorithms that have rather limited memory requirements. Our proposed matrix multiplication algorithm for sparse matrices uses the available memory space very cautiously and also results in good execution times. Performance results testify to this fact

Digital Commons @ New Jersey Institute of Technology (NJIT)

High Performance Reconfigurable Computing for Linear Algebra: Design and Performance Analysis

Author: Sun Junqing
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2008
Field of study

Field Programmable Gate Arrays (FPGAs) enable powerful performance acceleration for scientific computations because of their intrinsic parallelism, pipeline ability, and flexible architecture. This dissertation explores the computational power of FPGAs for an important scientific application: linear algebra. First of all, optimized linear algebra subroutines are presented based on enhancements to both algorithms and hardware architectures. Compared to microprocessors, these routines achieve significant speedup. Second, computing with mixed-precision data on FPGAs is proposed for higher performance. Experimental analysis shows that mixed-precision algorithms on FPGAs can achieve the high performance of using lower-precision data while keeping higher-precision accuracy for finding solutions of linear equations. Third, an execution time model is built for reconfigurable computers (RC), which plays an important role in performance analysis and optimal resource utilization of FPGAs. The accuracy and efficiency of parallel computing performance models often depend on mean maximum computations. Despite significant prior work, there have been no sufficient mathematical tools for this important calculation. This work presents an Effective Mean Maximum Approximation method, which is more general, accurate, and efficient than previous methods. Together, these research results help address how to make linear algebra applications perform better on high performance reconfigurable computing architectures

University of Tennessee, Knoxville: Trace

Sparse Matrix Sparse Vector Multiplication using Parallel and Reconfigurable Computing

Author: Baugher Kirk Andrew
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2004
Field of study

The purpose of this thesis is to provide analysis and insight into the implementation of sparse matrix sparse vector multiplication on a reconfigurable parallel computing platform. Common implementations of sparse matrix sparse vector multiplication are completed by unary processors or parallel platforms today. Unary processor implementations are limited by their sequential solution of the problem while parallel implementations suffer from communication delays and load balancing issues when preprocessing techniques are not used or unavailable. By exploiting the deficiencies in sparse matrix sparse vector multiplication on a typical unary processor as a strength of parallelism on a Field Programmable Gate Array (FPGA), the potential performance improvements and tradeoffs for shifting the operation to hardware assisted implementation will be evaluated. This will simply be accomplished through multiple collaborating processes designed on an FPGA

University of Tennessee, Knoxville: Trace