29 research outputs found

    ASIC Design of Radix-2,8-Point FFT Processor

    Get PDF
    230-238In split radix architecture, large sizes Fast Fourier Transforms (FFT) are decomposed into small independent computations to reduce storage burden. Radix-2, 8-point is one the popular choice in split radix for small independent computation. Authors proposes the FFT processor architecture for this small independent computation i.e. radix-2, 8-point FFT. This paper brief architecture comprising Butterfly Unit (BU), register set and controller. The novelty of this architecture is that it replaces the series of Processing Elements (PE) by single BU. BU computes two halves of the computations concurrently. Arithmetic computations are performed in floating point form to overcome the nonlinearities. All computations are controlled by tailored instruction set. All instructions are of same size and have same execution time. Twiddle constants are implicitly available in the instruction. Internal computations are stored in register set to avoid the load and store operations with memory. The mean square error of the computation is reduced by 41.95% and 55.76% in magnitude and phase respectively as compared with computations performed by rounding the twiddle constant. This FFT processor is synthesized, placed and routed for 45 nm technology of nangate open cell library. The BU of this architecture is 18.89% smaller and 5.13% faster as compared with smallest and fastest BU reported previously. The hardware cost metric i.e. Dp mm2 ns2 mW of proposed processor is 1.37. This cost metric is also 32.51% less as compared with the previous work

    ASIC Design of Radix-2, 8-point FFT ProcessorĀ 

    Get PDF
    In split radix architecture, large sizes Fast Fourier Transforms (FFT) are decomposed into small independent computations to reduce storage burden. Radix-2, 8 point is one the popular choice in split radix for small independent computation. Author proposes the FFT processor architecture for this small independent computation i.e. radix-2, 8-point FFT. This paper brief architecture comprising Butterfly Unit (BU), register set and controller. The novelty of this architecture is that it replaces the series of Processing Elements (PE) by single BU. BU computes two halves of the computations concurrently. Arithmetic computations are performed in floating point form to overcome the nonlinearities. All computations are controlled by tailored instruction set. All instructions are of same size and have same execution time. Twiddle constants are implicitly available in the instruction. Internal computations are stored in register set to avoid the load and store operations with memory. The mean square error of the computation is reduced by 41.95 % and 55.76 % in magnitude and phase respectively as compared with computations performed by rounding the twiddle constant. This FFT processor is synthesized, placed and routed for 45 nm technology of nangate open cell library. The BU of this architecture is 18 % smaller and 5 % faster as compared with smallest and fastest BU reported previously. The hardware cost metric i.e.Ā  Ā  Dp mm2 ns2 mW = 1.37 of proposed processor and 32.51 % less as compared with the previous work

    Efficient FPGA implementation and power modelling of image and signal processing IP cores

    Get PDF
    Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage and signal processing application areas such as consumer electronics, instrumentation, medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area. A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Efficient implementation of video processing algorithms on FPGA

    Get PDF
    The work contained in this portfolio thesis was carried out as part of an Engineering Doctorate (Eng.D) programme from the Institute for System Level Integration. The work was sponsored by Thales Optronics, and focuses on issues surrounding the implementation of video processing algorithms on field programmable gate arrays (FPGA). A description is given of FPGA technology and the currently dominant methods of designing and verifying firmware. The problems of translating a description of behaviour into one of structure are discussed, and some of the latest methodologies for tackling this problem are introduced. A number of algorithms are then looked at, including methods of contrast enhancement, deconvolution, and image fusion. Algorithms are characterised according to the nature of their execution flow, and this is used as justification for some of the design choices that are made. An efficient method of performing large two-dimensional convolutions is also described. The portfolio also contains a discussion of an FPGA implementation of a PID control algorithm, an overview of FPGA dynamic reconfigurability, and the development of a demonstration platform for rapid deployment of video processing algorithms in FPGA hardware

    Parallel computing 2011, ParCo 2011: book of abstracts

    Get PDF
    This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu

    Efficient architectures for multidimensional discrete transforms in image and video processing applications

    Get PDF
    PhD ThesisThis thesis introduces new image compression algorithms, their related architectures and data transforms architectures. The proposed architectures consider the current hardware architectures concerns, such as power consumption, hardware usage, memory requirement, computation time and output accuracy. These concerns and problems are crucial in multidimensional image and video processing applications. This research is divided into three image and video processing related topics: low complexity non-transform-based image compression algorithms and their architectures, architectures for multidimensional Discrete Cosine Transform (DCT); and architectures for multidimensional Discrete Wavelet Transform (DWT). The proposed architectures are parameterised in terms of wordlength, pipelining and input data size. Taking such parameterisation into account, efficient non-transform based and low complexity image compression algorithms for better rate distortion performance are proposed. The proposed algorithms are based on the Adaptive Quantisation Coding (AQC) algorithm, and they achieve a controllable output bit rate and accuracy by considering the intensity variation of each image block. Their high speed, low hardware usage and low power consumption architectures are also introduced and implemented on Xilinx devices. Furthermore, efficient hardware architectures for multidimensional DCT based on the 1-D DCT Radix-2 and 3-D DCT Vector Radix (3-D DCT VR) fast algorithms have been proposed. These architectures attain fast and accurate 3-D DCT computation and provide high processing speed and power consumption reduction. In addition, this research also introduces two low hardware usage 3-D DCT VR architectures. Such architectures perform the computation of butterfly and post addition stages without using block memory for data transposition, which in turn reduces the hardware usage and improves the performance of the proposed architectures. Moreover, parallel and multiplierless lifting-based architectures for the 1-D, 2-D and 3-D Cohen-Daubechies-Feauveau 9/7 (CDF 9/7) DWT computation are also introduced. The presented architectures represent an efficient multiplierless and low memory requirement CDF 9/7 DWT computation scheme using the separable approach. Furthermore, the proposed architectures have been implemented and tested using Xilinx FPGA devices. The evaluation results have revealed that a speed of up to 315 MHz can be achieved in the proposed AQC-based architectures. Further, a speed of up to 330 MHz and low utilisation rate of 722 to 1235 can be achieved in the proposed 3-D DCT VR architectures. In addition, in the proposed 3-D DWT architecture, the computation time of 3-D DWT for data size of 144Ɨ176Ɨ8-pixel is less than 0.33 ms. Also, a power consumption of 102 mW at 50 MHz clock frequency using 256Ɨ256-pixel frame size is achieved. The accuracy tests for all architectures have revealed that a PSNR of infinite can be attained

    Serial-data computation in VLSI

    Get PDF
    corecore