3,180 research outputs found
Recommended from our members
Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow.
The most important achievements of the work presented in this thesis are summarised
here.
Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 Ā£ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes.
Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place.
Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus
Distributed video coding for wireless video sensor networks: a review of the state-of-the-art architectures
Distributed video coding (DVC) is a relatively new video coding architecture originated from two fundamental theorems namely, SlepianāWolf and WynerāZiv. Recent research developments have made DVC attractive for applications in the emerging domain of wireless video sensor networks (WVSNs). This paper reviews the state-of-the-art DVC architectures with a focus on understanding their opportunities and gaps in addressing the operational requirements and application needs of WVSNs
Attention-free Spikformer: Mixing Spike Sequences with Simple Linear Transforms
By integrating the self-attention capability and the biological properties of
Spiking Neural Networks (SNNs), Spikformer applies the flourishing Transformer
architecture to SNNs design. It introduces a Spiking Self-Attention (SSA)
module to mix sparse visual features using spike-form Query, Key, and Value,
resulting in the State-Of-The-Art (SOTA) performance on numerous datasets
compared to previous SNN-like frameworks. In this paper, we demonstrate that
the Spikformer architecture can be accelerated by replacing the SSA with an
unparameterized Linear Transform (LT) such as Fourier and Wavelet transforms.
These transforms are utilized to mix spike sequences, reducing the quadratic
time complexity to log-linear time complexity. They alternate between the
frequency and time domains to extract sparse visual features, showcasing
powerful performance and efficiency. We conduct extensive experiments on image
classification using both neuromorphic and static datasets. The results
indicate that compared to the SOTA Spikformer with SSA, Spikformer with LT
achieves higher Top-1 accuracy on neuromorphic datasets (i.e., CIFAR10-DVS and
DVS128 Gesture) and comparable Top-1 accuracy on static datasets (i.e.,
CIFAR-10 and CIFAR-100). Furthermore, Spikformer with LT achieves approximately
29-51% improvement in training speed, 61-70% improvement in inference speed,
and reduces memory usage by 4-26% due to not requiring learnable parameters.Comment: Under Revie
Volumetric Medical Images Visualization on Mobile Devices
Volumetric medical images visualization is an important tool in the diagnosis
and treatment of diseases. Through history, one of the most dificult
tasks for Medicine Specialists has been the accurate location of broken bones
and of the damaged tissues during Chemotherapy treatment, among other
applications; like techniques used in Neurological Studies. Thus these situations
enhance the need of visualization in Medicine. New technologies,
the improvement and development of new hardware as well as software and
the updating of old ones for graphic applications have resulted in specialized
systems for medical visualization. However the use of these techniques
in mobile devices has been poor due to its low performance. In our work,
we propose a client-server scheme, where the model is compressed in the
server side and is reconstructed in a nal thin-client device. The technique
restricts the natural density values to achieve good bone visualization in
medical models, transforming the rest of the data to zero. Our proposal
uses a tridimensional Haar Wavelet Function locally applied inside units
blocks of 16x16x16, similar to the Wavelet Based 3D Compression Scheme
for Interactive Visualization of Very Large Volume Data approach. We also
implement a quantization algorithm which handles error coeficients according
to the frequency distributions of these coe cients. Finally, we made
an evaluation of the volume visualization; on current mobile devices .We
present the speci cations for the implementation of our technique in the
Nokia n900 Mobile Phone
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
With ever-increasing volumes of scientific data produced by HPC applications,
significantly reducing data size is critical because of limited capacity of
storage space and potential bottlenecks on I/O or networks in writing/reading
or transferring data. SZ and ZFP are the two leading lossy compressors
available to compress scientific data sets. However, their performance is not
consistent across different data sets and across different fields of some data
sets: for some fields SZ provides better compression performance, while other
fields are better compressed with ZFP. This situation raises the need for an
automatic online (during compression) selection between SZ and ZFP, with a
minimal overhead. In this paper, the automatic selection optimizes the
rate-distortion, an important statistical quality metric based on the
signal-to-noise ratio. To optimize for rate-distortion, we investigate the
principles of SZ and ZFP. We then propose an efficient online, low-overhead
selection algorithm that predicts the compression quality accurately for two
compressors in early processing stages and selects the best-fit compressor for
each data field. We implement the selection algorithm into an open-source
library, and we evaluate the effectiveness of our proposed solution against
plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results
on three data sets representing about 100 fields show that our selection
algorithm improves the compression ratio up to 70% with the same level of data
distortion because of very accurate selection (around 99%) of the best-fit
compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio
Frequency-modulated continuous-wave LiDAR compressive depth-mapping
We present an inexpensive architecture for converting a frequency-modulated
continuous-wave LiDAR system into a compressive-sensing based depth-mapping
camera. Instead of raster scanning to obtain depth-maps, compressive sensing is
used to significantly reduce the number of measurements. Ideally, our approach
requires two difference detectors. % but can operate with only one at the cost
of doubling the number of measurments. Due to the large flux entering the
detectors, the signal amplification from heterodyne detection, and the effects
of background subtraction from compressive sensing, the system can obtain
higher signal-to-noise ratios over detector-array based schemes while scanning
a scene faster than is possible through raster-scanning. %Moreover, we show how
a single total-variation minimization and two fast least-squares minimizations,
instead of a single complex nonlinear minimization, can efficiently recover
high-resolution depth-maps with minimal computational overhead. Moreover, by
efficiently storing only data points from measurements of an
pixel scene, we can easily extract depths by solving only two linear equations
with efficient convex-optimization methods
- ā¦