444 research outputs found

    Efficient hardware architectures for MPEG-4 core profile

    Get PDF
    Efficient hardware acceleration architectures are proposed for the most demandingMPEG-4 core profile algorithms, namely; texture motion estimation (TME), binary motion estimation (BME)and the shape adaptive discrete cosine transform (SA-DCT). The proposed ME designs may also be used for H.264, since both architectures can handle variable block sizes. Both ME architectures employ early termination techniques that reduce latency and save needless memory accesses and power consumption. They also use a pixel subsampling technique to facilitate parallelism, while balancing the computational load. The BME datapath also saves operations by using Run Length Coded (RLC) pixel addressing. The SA-DCT module has a re-configuring multiplier-less serial datapath using adders and multiplexers only to improve area and power. The SA-DCT packing steps are done using a minimal switching addressing scheme with guarded evaluation. All three modules have been synthesised targeting the WildCard-II FPGA benchmarking platform adopted by the MPEG-4 Part9 reference hardware group

    Hardware acceleration architectures for MPEG-Based mobile video platforms: a brief overview

    Get PDF
    This paper presents a brief overview of past and current hardware acceleration (HwA) approaches that have been proposed for the most computationally intensive compression tools of the MPEG-4 standard. These approaches are classified based on their historical evolution and architectural approach. An analysis of both evolutionary and functional classifications is carried out in order to speculate on the possible trends of the HwA architectures to be employed in mobile video platforms

    An improved architecture for the adaptive discrete cosine transform

    Get PDF

    Energy-efficient acceleration of MPEG-4 compression tools

    Get PDF
    We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art

    Low power techniques for video compression

    Get PDF
    This paper gives an overview of low-power techniques proposed in the literature for mobile multimedia and Internet applications. Exploitable aspects are discussed in the behavior of different video compression tools. These power-efficient solutions are then classified by synthesis domain and level of abstraction. As this paper is meant to be a starting point for further research in the area, a lowpower hardware & software co-design methodology is outlined in the end as a possible scenario for video-codec-on-a-chip implementations on future mobile multimedia platforms

    Design of 2D discrete cosine transform using CORDIC architectures in VHDL

    Get PDF
    The Discrete Cosine Transform is one of the most widely transform techniques in digital signal processing. In addition, this is also most computationally intensive transforms which require many multiplications and additions. Real time data processing necessitates the use of special purpose hardware which involves hardware efficiency as well as high throughput. Many DCT algorithms were proposed in order to achieve high speed DCT. Those architectures which involves multipliers, for example Chen’s algorithm has less regular architecture due to complex routing and requires large silicon area. On the other hand, the DCT architecture based on distributed arithmetic (DA) which is also a multiplier less architecture has the inherent disadvantage of less throughputs because of the ROM access time and the need of accumulator. Also this DA algorithm requires large silicon area if it requires large ROM size. Systolic array architecture for the real-time DCT computation may have the large number of gates and clock skew problem. The other ways of implementation of DCT which involves in multiplierless, thus power efficient and which results in regular architecture and less complicated routing, consequently less area, simultaneously lead to high throughput. So for that purpose CORDIC seems to be a best solution. CORDIC offers a unified iterative formulation to efficiently evaluate the rotation operation. This thesis presents the implementation of 2D Discrete Cosine Transform (DCT) using the Angle Recoded (AR) Cordic algorithm, the new scaling less CORDIC algorithm and the conventional Chen’s algorithm which is multiplier dependant algorithm. The 2D DCT is implemented by exploiting the Separability property of 2D Discrete Cosine Transform. Here first one dimensional DCT is designed first and later a transpose buffer which consists of 64 memory elements, fully pipelined is designed. Later all these blocks are joined with the help of a controller element which is a mealy type FSM which produces some status signals also. The three resulting architectures are all well synthesized in Xilinx 9.1ise, simulated in Modelsim 5.6f and the power is calculated in Xilinx Xpower. Results prove that AR Cordic algorithm is better than Chen’s algorithm, even the new scaling less CORDIC algorithm
    corecore