2,429 research outputs found

    Design and FPGA Implementation of CORDIC-based 8-point 1D DCT Processor

    Get PDF
    CORDIC or CO-ordinate Rotation DIgital Computer is a fast, simple, efficient and powerful algorithm used for diverse Digital Signal Processing applications. Primarily developed for real-time airborne computations, it uses a unique computing technique which is especially suitable for solving the trigonometric relationships involved in plane co-ordinate rotation and conversion from rectangular to polar form. It comprises a special serial arithmetic unit having three shift registers, three adders/subtractors, Look-Up table and special interconnections. Using a prescribed sequence of conditional additions or subtractions the CORDIC arithmetic unit can be controlled to solve either of the following equations: Y’=K (Ycos λ+ Xsin λ) X’=K (Xcos λ - Ysin λ); where K is a constant In this project: • A CORDIC-based processor for sine/cosine calculation was designed using VHDL programming in Xilinx ISE 10.1. The CORDIC module was tested for its functionality and correctness by test-bench analysis. Subsequently, FPGA implementation of the CORDIC core followed by ChipScopePro analysis of the output logic waveforms was performed. • Using this CORDIC core a DCT processor was designed to calculate the 8-point 1D DCT. The functionality and operational correctness of this processor was tested, first on the test-bench and then via ChipScopePro analysis, post FPGA implementation. The output obtained in both the cases was compared with the actual values to test for consistency and the percentage of accuracy was established. Power consumption and FPGA resource utilization were observed. The results obtained were discussed

    Hardware implementation of daubechies wavelet transforms using folded AIQ mapping

    Get PDF
    The Discrete Wavelet Transform (DWT) is a popular tool in the field of image and video compression applications. Because of its multi-resolution representation capability, the DWT has been used effectively in applications such as transient signal analysis, computer vision, texture analysis, cell detection, and image compression. Daubechies wavelets are one of the popular transforms in the wavelet family. Daubechies filters provide excellent spatial and spectral locality-properties which make them useful in image compression. In this thesis, we present an efficient implementation of a shared hardware core to compute two 8-point Daubechies wavelet transforms. The architecture is based on a new two-level folded mapping technique, an improved version of the Algebraic Integer Quantization (AIQ). The scheme is developed on the factorization and decomposition of the transform coefficients that exploits the symmetrical and wrapping structure of the matrices. The proposed architecture is parallel, pipelined, and multiplexed. Compared to existing designs, the proposed scheme reduces significantly the hardware cost, critical path delay and power consumption with a higher throughput rate. Later, we have briefly presented a new mapping scheme to error-freely compute the Daubechies-8 tap wavelet transform, which is the next transform of Daubechies-6 in the Daubechies wavelet series. The multidimensional technique maps the irrational transformation basis coefficients with integers and results in considerable reduction in hardware and power consumption, and significant improvement in image reconstruction quality

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Low power VLSI implementation schemes for DCT-based image compression

    Get PDF

    Systolic VLSI chip for implementing orthogonal transforms, A

    Get PDF
    Includes bibliographical references.This paper describes the design of a systolic VLSI chip for the implementation of signal processing algorithms that may be decomposed into a product of simple real rotations. These include orthogonal transformations. Applications of this chip include projections, discrete Fourier and cosine transforms, and geometrical transformations. Large transforms may be computed by "tiling" together many chips for increased throughput. A CMOS VLSI chip containing 138 000 transistors in a 5x3 array of rotators has been designed, fabricated, and tested. The chip has a 32-MHz clock and performs real rotations at a rate of 30 MHz. The systolic nature of the chip makes use of fully synchronous bit-serial interconnect and a very regular structure at the rotator and bit levels. A distributed arithmetic scheme is used to implement the matrix-vector multiplication of the rotation.This work was supported by Ball Aerospace, Boulder, CO, and by the Office of Naval Research, Electronics Branch, Arlington, VA, under Contract ONR 85-K-0693

    07361 Abstracts Collection -- Programming Models for Ubiquitous Parallelism

    Get PDF
    From 02.09. to 07.09.2007, the Dagstuhl Seminar 07361 ``Programming Models for Ubiquitous Parallelism\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available
    corecore