153 research outputs found

    Efficient VLSI Architectures for Image Compression Algorithms

    Get PDF
    An image, in its original form, contains huge amount of data which demands not only large amount of memory requirements for its storage but also causes inconvenient transmission over limited bandwidth channel. Image compression reduces the data from the image in either lossless or lossy way. While lossless image compression retrieves the original image data completely, it provides very low compression. Lossy compression techniques compress the image data in variable amount depending on the quality of image required for its use in particular application area. It is performed in steps such as image transformation, quantization and entropy coding. JPEG is one of the most used image compression standard which uses discrete cosine transform (DCT) to transform the image from spatial to frequency domain. An image contains low visual information in its high frequencies for which heavy quantization can be done in order to reduce the size in the transformed representation. Entropy coding follows to further reduce the redundancy in the transformed and quantized image data. Real-time data processing requires high speed which makes dedicated hardware implementation most preferred choice. The hardware of a system is favored by its lowcost and low-power implementation. These two factors are also the most important requirements for the portable devices running on battery such as digital camera. Image transform requires very high computations and complete image compression system is realized through various intermediate steps between transform and final bit-streams. Intermediate stages require memory to store intermediate results. The cost and power of the design can be reduced both in efficient implementation of transforms and reduction/removal of intermediate stages by employing different techniques. The proposed research work is focused on the efficient hardware implementation of transform based image compression algorithms by optimizing the architecture of the system. Distribute arithmetic (DA) is an efficient approach to implement digital signal processing algorithms. DA is realized by two different ways, one through storage of precomputed values in ROMs and another without ROM requirements. ROM free DA is more efficient. For the image transform, architectures of one dimensional discrete Hartley transform (1-D DHT) and one dimensional DCT (1-D DCT) have been optimized using ROM free DA technique. Further, 2-D separable DHT (SDHT) and 2-D DCT architectures have been implemented in row-column approach using two 1-D DHT and two 1-D DCT respectively. A finite state machine (FSM) based architecture from DCT to quantization has been proposed using the modified quantization matrix in JPEG image compression which requires no memory in storage of quantization table and DCT coefficients. In addition, quantization is realized without use of multipliers that require more area and are power hungry. For the entropy encoding, Huffman coding is hardware efficient than arithmetic coding. The use of Huffman code table further simplifies the implementation. The strategies have been used for the significant reduction of memory bits in storage of Huffman code table and the complete Huffman coding architecture encodes the transformed coefficients one bit per clock cycle. Direct implementation algorithm of DCT has the advantage that it is free of transposition memory to store intermediate 1-D DCT. Although recursive algorithms have been a preferred method, these algorithms have low accuracy resulting in image quality degradation. A non-recursive equation for the direct computation of DCT coefficients have been proposed and implemented in both 0.18 µm ASIC library as well as FPGA. It can compute DCT coefficients in any order and all intermediate computations are free of fractions and hence very high image quality has been obtained in terms of PSNR. In addition, one multiplier and one register bit-width need to be changed for increasing the accuracy resulting in very low hardware overhead. The architecture implementation has been done to obtain zig-zag ordered DCT coefficients. The comparison results show that this implementation has less area in terms of gate counts and less power consumption than the existing DCT implementations. Using this architecture, the complete JPEG image compression system has been implemented which has Huffman coding module, one multiplier and one register as the only additional modules. The intermediate stages (DCT to Huffman encoding) are free of memory, hence efficient architecture is obtained

    Efficient reconfigurable architectures for 3D medical image compression

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE), Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci

    FPGA Implementation of DHT Algorithms for Image Compression

    Get PDF
    Digital image processing is the use of computer algorithms to perform image processing on digital images. The basic operation performed by a simple digital camera is, to convert the light energy to electrical energy, then the energy is converted to digital format and a compression algorithm is used to reduce memory requirement for storing the image. This compression algorithm is frequently called for capturing and storing the images. This leads us to develop an efficient compression algorithm which will give the same result as that of the existing algorithms with low power consumption. Compression is useful as it helps in reduction of the usage of expensive resources, such as memory (hard disks), or the transmission bandwidth required. But on the downside, compression techniques result in distortion (due to lossy compression schemes) and also additional computational resources are required for compression-decompression of the data. Reduction of these resources by comparing different algorithms for DHT is required. FPGA Implementations of different algorithms for 1-DHT using VHDL as the synthesis tool are carried out and their comparison gives the optimum technique for compression. Finally 2-D DHT is implemented using the optimum 1-D technique for 8x8 matrix input. The results obtained are discussed and improvements are suggested to further optimize the design

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Investigation of a Novel Common Subexpression Elimination Method for Low Power and Area Efficient DCT Architecture

    Get PDF
    A wide interest has been observed to find a low power and area efficient hardware design of discrete cosine transform (DCT) algorithm. This research work proposed a novel Common Subexpression Elimination (CSE) based pipelined architecture for DCT, aimed at reproducing the cost metrics of power and area while maintaining high speed and accuracy in DCT applications. The proposed design combines the techniques of Canonical Signed Digit (CSD) representation and CSE to implement the multiplier-less method for fixed constant multiplication of DCT coefficients. Furthermore, symmetry in the DCT coefficient matrix is used with CSE to further decrease the number of arithmetic operations. This architecture needs a single-port memory to feed the inputs instead of multiport memory, which leads to reduction of the hardware cost and area. From the analysis of experimental results and performance comparisons, it is observed that the proposed scheme uses minimum logic utilizing mere 340 slices and 22 adders. Moreover, this design meets the real time constraints of different video/image coders and peak-signal-to-noise-ratio (PSNR) requirements. Furthermore, the proposed technique has significant advantages over recent well-known methods along with accuracy in terms of power reduction, silicon area usage, and maximum operating frequency by 41%, 15%, and 15%, respectively

    Efficient FPGA implementation and power modelling of image and signal processing IP cores

    Get PDF
    Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage and signal processing application areas such as consumer electronics, instrumentation, medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area. A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore