Search CORE

387 research outputs found

Energy-efficient acceleration of MPEG-4 compression tools

Author: Kinane Andrew
Larkin Daniel
O'Connor Noel E.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2007
Field of study

We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Irish Universities

DCU Online Research Access Service

A distributed arithmetic based CORDIC algorithm and its use in the FPGA implementation of the 2-D IDCT

Author: Yang Yi
Publication venue
Publication date: 01/01/2002
Field of study

The discrete cosine transform (DCT) based image compression techniques play an important role in today's digital applications. A video codec chip requires an integration of high-speed DCT and inverse DCT (IDCT) hardware units in a limited silicon space. This thesis presents a distributed arithmetic based CORDIC algorithm for the computation of the 1-D IDCT and an FPGA implementation of a cost-effective architecture for a 2-D IDCT processor using the proposed algorithm. The processor consisting of two 1-D IDCT cores, a transpose memory and a control logic block performs the 2-D IDCT computation by using the row-column decomposition approach. The basis of the proposed scheme is a combined use of the distributed arithmetic and the CORDIC algorithm in order to provide a small access time to the lookup tables and a reduced complexity for its architecture. In the proposed design, the deep pipeline structure of an existing CORDIC based architecture is replaced by much smaller DA-based ROM accumulators. In the proposed design, a bit-level digit-serial structure based on the redundant number system using an on-line algorithm is employed

Concordia University Research Repository

A New RTL Design Approach for a DCT/IDCT-Based Image Compression Architecture using the mCBE Algorithm

Author: null null
Nurfitri Anbarsanti
Rachmad Vidya Wicaksana Putra
Rella Mareta
Trio Adiono
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 01/01/2012
Field of study

In the literature, several approaches of designing a DCT/IDCT-based image compression system have been proposed. In this paper, we present a new RTL design approach with as main focus developing a DCT/IDCT-based image compression architecture using a self-created algorithm. This algorithm can efficiently minimize the amount of shifter -adders to substitute multiplier s. We call this new algorithm the multiplication from Common Binary Expression (mCBE) Algorithm. Besides this algorithm, we propose alternative quantization numbers, which can be implemented simply as shifters in digital hardware. Mostly, these numbers can retain a good compressed-image quality compared to JPEG recommendations. These ideas lead to our design being small in circuit area, multiplierless, and low in complexity. The proposed 8-point 1D-DCT design has only six stages, while the 8-point 1D-IDCT design has only seven stages (one stage being defined as equal to the delay of one shifter or 2-input adder). By using the pipelining method, we can achieve a high-speed architecture with latency as a trade-off consideration. The design has been synthesized and can reach a speed of up to 1.41ns critical path delay (709.22MHz).

Crossref

Journal of ICT Research and Applications

Directory of Open Access Journals

ITB Journal

Towards adaptive balanced computing (ABC) using reconfigurable functional caches (RFCs)

Author: Kim Hue-Sung
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2001
Field of study

The general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. Reconfigurable computing has emerged to combine the versatility of general-purpose processors with the customization ability of ASICs. The basic premise of reconfigurability is to provide better performance and higher computing density than fixed configuration processors. Most of the research in reconfigurable computing is dedicated to on-chip functional logic. If computing resources are adaptable to the computing requirement, the maximum performance can be achieved. To overcome the gap between processor and memory technology, the size of on-chip cache memory has been consistently increasing. The larger cache memory capacity, though beneficial in general, does not guarantee a higher performance for all the applications as they may not utilize all of the cache efficiently. To utilize on-chip resources effectively and to accelerate the performance of multimedia applications specifically, we propose a new architecture---Adaptive Balanced Computing (ABC). ABC uses dynamic resource configuration of on-chip cache memory by integrating Reconfigurable Functional Caches (RFC). RFC can work as a conventional cache or as a specialized computing unit when necessary. In order to convert a cache memory to a computing unit, we include additional logic to embed multi-bit output LUTs into the cache structure. We add the reconfigurability of cache memory to a conventional processor with minimal modification to the load/store microarchitecture and with minimal compiler assistance. ABC architecture utilizes resources more efficiently by reconfiguring the cache memory to computing units dynamically. The area penalty for this reconfiguration is about 50--60% of the memory cell cache array-only area with faster cache access time. In a base array cache (parallel decoding caches), the area penalty is 10--20% of the data array with 1--2% increase in the cache access time. However, we save 27% for FIR and 44% for DCT/IDCT in area with respect to memory cell array cache and about 80% for both applications with respect to base array cache if we were to implement all these units separately (such as ASICs). The simulations with multimedia and DSP applications (DCT/IDCT and FIR/IIR) show that the resource configuration with the RFC speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations. The simulations with various parameters indicate that the impact of reconfiguration can be minimized if an appropriate cache organization is selected

Digital Repository @ Iowa State University (ISU)

Low power techniques for video compression

Author: Marlow Seán
McGrath Stephen
Muresan Valentin
Murphy Noel
O'Connor Noel E.
Publication venue
Publication date: 01/01/2002
Field of study

This paper gives an overview of low-power techniques proposed in the literature for mobile multimedia and Internet applications. Exploitable aspects are discussed in the behavior of different video compression tools. These power-efficient solutions are then classified by synthesis domain and level of abstraction. As this paper is meant to be a starting point for further research in the area, a lowpower hardware & software co-design methodology is outlined in the end as a possible scenario for video-codec-on-a-chip implementations on future mobile multimedia platforms

CiteSeerX

Irish Universities

DCU Online Research Access Service

A Comparative Study of Scheduling Techniques for Multimedia Applications on SIMD Pipelines

Author: Arslan Mehmet Ali
Gruian Flavius
Kuchcinski Krzysztof
Publication venue
Publication date: 01/01/2015
Field of study

Parallel architectures are essential in order to take advantage of the parallelism inherent in streaming applications. One particular branch of these employ hardware SIMD pipelines. In this paper, we analyse several scheduling techniques, namely ad hoc overlapped execution, modulo scheduling and modulo scheduling with unrolling, all of which aim to efficiently utilize the special architecture design. Our investigation focuses on improving throughput while analysing other metrics that are important for streaming applications, such as register pressure, buffer sizes and code size. Through experiments conducted on several media benchmarks, we present and discuss trade-offs involved when selecting any one of these scheduling techniques.Comment: Presented at DATE Friday Workshop on Heterogeneous Architectures and Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241

arXiv.org e-Print Archive

Lund University Publications

Synthesis of application specific processor architectures for ultra-low energy consumption

Author: Kazmierski T J
Leech Charles
Publication venue
Publication date: 12/02/2014
Field of study

In this paper we suggest that further energy savings can be achieved by a new approach to synthesis of embedded processor cores, where the architecture is tailored to the algorithms that the core executes. In the context of embedded processor synthesis, both single-core and many-core, the types of algorithms and demands on the execution efficiency are usually known at the chip design time. This knowledge can be utilised at the design stage to synthesise architectures optimised for energy consumption. Firstly, we present an overview of both traditional energy saving techniques and new developments in architectural approaches to energy-efficient processing. Secondly, we propose a picoMIPS architecture that serves as an architectural template for energy-efficient synthesis. As a case study, we show how the picoMIPS architecture can be tailored to an energy efficient execution of the DCT algorithm

Southampton (e-Prints Soton)