208 research outputs found
Recommended from our members
Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow.
The most important achievements of the work presented in this thesis are summarised
here.
Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 Ā£ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes.
Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place.
Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus
A 2D DWT architecture suitable for the Embedded Zerotree Wavelet Algorithm
Digital Imaging has had an enormous impact on industrial applications such as the Internet and video-phone systems. However, demand for industrial applications is growing enormously. In particular, internet application users are, growing at a near exponential rate. The sharp increase in applications using digital images has caused much emphasis on the fields of image coding, storage, processing and communications. New techniques are continuously developed with the main aim of increasing efficiency. Image coding is in particular a field of great commercial interest. A digital image requires a large amount of data to be created. This large amount of data causes many problems when storing, transmitting or processing the image. Reducing the amount of data that can be used to represent an image is the main objective of image coding. Since the main objective is to reduce the amount of data that represents an image, various techniques have been developed and are continuously developed to increase efficiency. The JPEG image coding standard has enjoyed widespread acceptance, and the industry continues to explore its various implementation issues. However, recent research indicates multiresolution based image coding is a far superior alternative.
A recent development in the field of image coding is the use of Embedded Zerotree Wavelet (EZW) as the technique to achieve image compression. One of The aims of this theses is to explain how this technique is superior to other current coding standards. It will be seen that an essential part orthis method of image coding is the use of multi resolution analysis, a subband system whereby the subbands arc logarithmically spaced in frequency and represent an octave band decomposition. The block structure that implements this function is termed the two dimensional Discrete Wavelet Transform (2D-DWT). The 20 DWT is achieved by several architectures and these are analysed in order to choose the best suitable architecture for the EZW coder. Finally, this architecture is implemented and verified using the Synopsys Behavioural Compiler and recommendations are made based on experimental findings
Recommended from our members
Efficient FPGA implementation and power modelling of image and signal processing IP cores
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage
and signal processing application areas such as consumer electronics, instrumentation,
medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA
devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the
work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of
cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area.
A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM
is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed
Discrete wavelet transform realisation using run-time reconfiguration of field programmable gate array (FPGA)s
Abstract: Designing a universal embedded hardware architecture for discrete wavelet transform is a challenging problem because of the diversity among wavelet kernel filters. In this work, the authors present three different hardware architectures for implementing multiple wavelet kernels. The first scheme utilises fixed, parallel hardware for all the required wavelet kernels, whereas the second scheme employs a processing element (PE)-based datapath that can be configured for multiple wavelet filters during run-time. The third scheme makes use of partial run-time configuration of FPGA units for dynamically programming any desired wavelet filter. As a case study, the authors present FPGA synthesis results for simultaneous implementation of six different wavelets for the proposed methods. Performance analysis and comparison of area, timing and power results are presented for the Virtex-II Pro FPGA implementations
High-Speed Pipeline VLSI Architectures for Discrete Wavelet Transforms
The discrete wavelet transform (DWT) has been widely used in many fields, such as image compression, speech analysis and pattern recognition, because of its capability of decomposing a signal at multiple resolution levels. Due to the intensive computations involved with this transform, the design of efficient VLSI architectures for a fast computation of the transforms have become essential, especially for real-time applications and those requiring processing of high-speed data. The objective of this thesis is to develop a scheme for the design of hardware resource-efficient high-speed pipeline architectures for the computation of the DWT. The goal of high speed is achieved by maximizing the operating frequency and minimizing the number of clock cycles required for the DWT computation with little or no overhead on the hardware resources. In this thesis, an attempt is made to reach this goal by enhancing the inter-stage and intra-stage parallelisms through a systematic exploitation of the characteristics inherent in discrete wavelet transforms.
In order to enhance the inter-stage parallelism, a study is undertaken for determining the number of pipeline stages required for the DWT computation so as to synchronize their operations and utilize their hardware resources efficiently. This is achieved by optimally distributing the computational load associated with the various resolution levels to an optimum number of stages of the pipeline. This study has determined that employment of two pipeline stages with the first one performing the task of the first resolution level and the second one that of all the other resolution levels of the 1-D DWT computation, and employment of three pipeline stages with the first and second ones performing the tasks of the first and second resolution levels and the third one performing that of the remaining resolution levels of the 2-D DWT computation, are the optimum choices for the development of 1-D and 2-D pipeline architectures, respectively. The enhancement of the intra-stage parallelism is based on two main ideas. The first idea, which stems from the fact that in each consecutive resolution level the input data are decimated by a factor of two along each dimension, is to decompose the filtering operation into subtasks that can be performed in parallel by operating on even- and odd-numbered samples along each dimension of the data. It is shown that each subtask, which is essentially a set of multiply-accumulate operations, can be performed by employing a MAC-cell network consisting of a two-dimensional array of bit-wise adders. The second idea in enhancing the intra-stage parallelism is to maximally extend the bit-wise addition operations of this network horizontally through a suitable arrangement of bit-wise adders so as to minimize the delay of its critical path.
In order to validate the proposed scheme, design and implementation of two specific examples of pipeline architectures for the 1-D and 2-D DWT computations are considered. The simulation results show that the pipeline architectures designed using the proposed scheme are able to operate at high clock frequencies, and their performances, in terms of the processing speed and area-time product, are superior to those of the architectures designed based on other schemes and utilizing similar or higher amount of hardware resources. Finally, the two pipeline architectures designed using the proposed scheme are implemented in FPGA. The test results of the FPGA implementations validate the feasibility and effectiveness of the proposed scheme for designing DWT pipeline architectures
Dynamically Reconfigurable Systolic Array Accelerators: A Case Study with Extended Kalman Filter and Discrete Wavelet Transform Algorithms
Field programmable grid arrays (FPGA) are increasingly being adopted as the primary on-board computing system for autonomous deep space vehicles. There is a need to support several complex applications for navigation and image processing in a rapidly responsive on-board FPGA-based computer. This requires exploring and combining several design concepts such as systolic arrays, hardware-software partitioning, and partial dynamic reconfiguration. A microprocessor/co-processor design that can accelerate two single precision oating-point algorithms, extended Kalman lter and a discrete wavelet transform, is presented. This research makes three key contributions. (i) A polymorphic systolic array framework comprising of recofigurable partial region-based sockets to accelerate algorithms amenable to being mapped onto linear systolic arrays. When implemented on a low end Xilinx Virtex4 SX35 FPGA the design provides a speedup of at least 4.18x and 6.61x over a state of the art microprocessor used in spacecraft systems for the extended Kalman lter and discrete wavelet transform algorithms, respectively. (ii) Switchboxes to enable communication between static and partial reconfigurable regions and a simple protocol to enable schedule changes when a socket\u27s contents are dynamically reconfigured to alter the concurrency of the participating systolic arrays. (iii) A hybrid partial dynamic reconfiguration method that combines Xilinx early access partial reconfiguration, on-chip bitstream decompression, and bitstream relocation to enable fast scaling of systolic arrays on the PolySAF. This technique provided a 2.7x improvement in reconfiguration time compared to an o-chip partial reconfiguration technique that used a Flash card on the FPGA board, and a 44% improvement in BRAM usage compared to not using compression
Design and FPGA Implementation of High Speed DWT-IDWT Architecture with Pipelined SPIHT Architecture for Image Compression
Image compression demands high speed architectures for transformation and encoding process Medical image compression demands lossless compression schemes and faster architectures A trade-off between speed and area decides the complexity of image compression algorithms In this work a high speed DWT architecture and pipelined SPIHT architecture is designed modeled and implemented on FPGA platform DWT computation is performed using matrix multiplication operation and is implemented on Virtex-5 FPGA that consumes less than 1 of the hardware resource The SPIHT algorithm that is performed using pipelined architecture and hence achieves higher throughput and latency The SPIHT algorithm operates at a frequency of 260 MHz and occupies area less than 15 of the resources The architecture designed is suitable for high speed image compression application
- ā¦