Search CORE

4,241 research outputs found

Optimising Sparse Matrix Vector multiplication for large scale FEM problems on FPGA

Author: Burovskiy P
Grigoras P
Luk W
Sherwin S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/08/2016
Field of study

Sparse Matrix Vector multiplication (SpMV) is an important kernel in many scientific applications. In this work we propose an architecture and an automated customisation method to detect and optimise the architecture for block diagonal sparse matrices. We evaluate the proposed approach in the context of the spectral/hp Finite Element Method, using the local matrix assembly approach. This problem leads to a large sparse system of linear equations with block diagonal matrix which is typically solved using an iterative method such as the Preconditioned Conjugate Gradient. The efficiency of the proposed architecture combined with the effectiveness of the proposed customisation method reduces BRAM resource utilisation by as much as 10 times, while achieving identical throughput with existing state of the art designs and requiring minimal development effort from the end user. In the context of the Finite Element Method, our approach enables the solution of larger problems than previously possible, enabling the applicability of FPGAs to more interesting HPC problems

Spiral - Imperial College Digital Repository

Non-parametric linear time-invariant system identification by discrete wavelet transforms

Author: Damper R. I.
Luk R. W.-P.
Publication venue
Publication date: 01/01/2006
Field of study

We describe the use of the discrete wavelet transform (DWT) for non-parametric linear time-invariant system identification. Identification is achieved by using a test excitation to the system under test (SUT) that also acts as the analyzing function for the DWT of the SUT's output, so as to recover the impulse response. The method uses as excitation any signal that gives an orthogonal inner product in the DWT at some step size (that cannot be 1). We favor wavelet scaling coefficients as excitations, with a step size of 2. However, the system impulse or frequency response can then only be estimated at half the available number of points of the sampled output sequence, introducing a multirate problem that means we have to 'oversample' the SUT output. The method has several advantages over existing techniques, e.g., it uses a simple, easy to generate excitation, and avoids the singularity problems and the (unbounded) accumulation of round-off errors that can occur with standard techniques. In extensive simulations, identification of a variety of finite and infinite impulse response systems is shown to be considerably better than with conventional system identification methods.Department of Computin

The Hong Kong Polytechnic University Pao Yue-kong Library

Southampton (e-Prints Soton)

An efficient sparse conjugate gradient solver using a Beneš permutation network

Author: Burovskiy PA
Chow G
Grigoras P
Luk W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/09/2014
Field of study

© 2014 Technical University of Munich (TUM).The conjugate gradient (CG) is one of the most widely used iterative methods for solving systems of linear equations. However, parallelizing CG for large sparse systems is difficult due to the inherent irregularity in memory access pattern. We propose a novel processor architecture for the sparse conjugate gradient method. The architecture consists of multiple processing elements and memory banks, and is able to compute efficiently both sparse matrix-vector multiplication, and other dense vector operations. A Beneš permutation network with an optimised control scheme is introduced to reduce memory bank conflicts without expensive logic. We describe a heuristics for offline scheduling, the effect of which is captured in a parametric model for estimating the performance of designs generated from our approach

Crossref

Spiral - Imperial College Digital Repository

CASK - Open-source custom architectures for sparse kernels

Author: Burovskiy P
Grigoras P
Luk W
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/02/2016
Field of study

© 2016 ACM.Sparse matrix vector multiplication (SpMV) is an impor- tant kernel in many scientific applications. To improve the performance and applicability of FPGA based SpMV, we propose an approach for exploiting properties of the input matrix to generate optimised custom architectures. The ar- chitectures generated by our approach are between 3.8 to 48 times faster than the worst case architectures for each matrix, showing the benefits of instance specific design for SpMV

Spiral - Imperial College Digital Repository

Directional optical switching and transistor functionality using optical parametric oscillation in a spinor polariton fluid

Author: Binder Rolf
Chan Chris K. P.
Kwong N. H.
Leung P. T.
Lewandowski Przemyslaw
Luk Samuel M. H.
Schumacher Stefan
Publication venue: 'The Optical Society'
Publication date: 16/07/2017
Field of study

Over the past decade, spontaneously emerging patterns in the density of polaritons in semiconductor microcavities were found to be a promising candidate for all-optical switching. But recent approaches were mostly restricted to scalar fields, did not benefit from the polariton's unique spin-dependent properties, and utilized switching based on hexagon far-field patterns with 60{\deg} beam switching (i.e. in the far field the beam propagation direction is switched by 60{\deg}). Since hexagon far-field patterns are challenging, we present here an approach for a linearly polarized spinor field, that allows for a transistor-like (e.g., crucial for cascadability) orthogonal beam switching, i.e. in the far field the beam is switched by 90{\deg}. We show that switching specifications such as amplification and speed can be adjusted using only optical means

arXiv.org e-Print Archive

Crossref

The University of Arizona

FLiMS: a fast lightweight 2-way merger for sorting

Author: Brooks C
Luk W
Papaphilippou P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

In this paper, we present FLiMS, a highly-efficient and simple parallel algorithm for merging two sorted lists residing in banked and/or wide memory. On FPGAs, its implementation uses fewer hardware resources than the state-of-the-art alternatives, due to the reduced number of comparators and elimination of redundant logic found on prior attempts. In combination with the distributed nature of the selector stage, a higher performance is achieved for the same amount of parallelism or higher. This is useful in many applications such as in parallel merge trees to achieve high-throughput sorting, where the resource utilisation of the merger is critical for building larger trees and internalising the workload for faster computation. Also presented are efficient variations of FLiMS for optimizing throughput for skewed datasets, achieving stable sorting or using fewer dequeue signals. FLiMS is also shown to perform well as conventional software on modern CPUs supporting single-instruction multiple-data (SIMD) instructions, surpassing the performance of some standard libraries for sorting

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository