Search CORE

2,209 research outputs found

A VLSI Approach for Cache Compression in Microprocessor

Author: Kurian M Z
M N Sharada Guptha
Pradeep H. S.
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 25/08/2020
Field of study

Speed is one of the important issues that generally customers consider for selecting any electronic component in the market. Speed of a microprocessor based system mainly depends on the speed of the microprocessor which in turn depends on the memory access time. Accessing on chip memory takes more time than accessing off-chip memory. Because of these, designers of memory system may find cache compression as an advantageous method to increase speed of a microprocessor based system, as it increases cache capacity and off-chip bandwidth. The However, most past work, and all work on cache compression, has made unsubstantiated assumptions about the performance, power consumption, and area overheads of the proposed compression algorithms and hardware. It is not possible to determine whether compression at levels of the memory hierarchy closest to the processor is beneficial without understanding its costs. Proposed hardware compression algorithms fall into the dictionary-based category, which depend on building a dictionary and using its entries to encode repeated data values. Proposed algorithm has number of novel features like including combining pairs of compressed lines into one cache line and allowing parallel compression of multiple words while using a single dictionary and without degradation in compression ratio

Interscience Research Network

Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

Author: Gujarathi Hemal S
McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Academy Publisher'
Publication date: 01/01/2009
Field of study

The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

University of Essex Research Repository

CiteSeerX

Crossref

Dynamically Reconfigurable Systolic Array Accelerators: A Case Study with Extended Kalman Filter and Discrete Wavelet Transform Algorithms

Author: Barnes Robert C
Publication venue: DigitalCommons@USU
Publication date: 01/05/2009
Field of study

Field programmable grid arrays (FPGA) are increasingly being adopted as the primary on-board computing system for autonomous deep space vehicles. There is a need to support several complex applications for navigation and image processing in a rapidly responsive on-board FPGA-based computer. This requires exploring and combining several design concepts such as systolic arrays, hardware-software partitioning, and partial dynamic reconfiguration. A microprocessor/co-processor design that can accelerate two single precision oating-point algorithms, extended Kalman lter and a discrete wavelet transform, is presented. This research makes three key contributions. (i) A polymorphic systolic array framework comprising of recofigurable partial region-based sockets to accelerate algorithms amenable to being mapped onto linear systolic arrays. When implemented on a low end Xilinx Virtex4 SX35 FPGA the design provides a speedup of at least 4.18x and 6.61x over a state of the art microprocessor used in spacecraft systems for the extended Kalman lter and discrete wavelet transform algorithms, respectively. (ii) Switchboxes to enable communication between static and partial reconfigurable regions and a simple protocol to enable schedule changes when a socket\u27s contents are dynamically reconfigured to alter the concurrency of the participating systolic arrays. (iii) A hybrid partial dynamic reconfiguration method that combines Xilinx early access partial reconfiguration, on-chip bitstream decompression, and bitstream relocation to enable fast scaling of systolic arrays on the PolySAF. This technique provided a 2.7x improvement in reconfiguration time compared to an o-chip partial reconfiguration technique that used a Flash card on the FPGA board, and a 44% improvement in BRAM usage compared to not using compression

DigitalCommons@USU

The Design of a System Architecture for Mobile Multimedia Computers

Author: Havinga Paul Johannes Mattheus
Publication venue: University of Twente
Publication date: 01/01/2000
Field of study

This chapter discusses the system architecture of a portable computer, called Mobile Digital Companion, which provides support for handling multimedia applications energy efficiently. Because battery life is limited and battery weight is an important factor for the size and the weight of the Mobile Digital Companion, energy management plays a crucial role in the architecture. As the Companion must remain usable in a variety of environments, it has to be flexible and adaptable to various operating conditions. The Mobile Digital Companion has an unconventional architecture that saves energy by using system decomposition at different levels of the architecture and exploits locality of reference with dedicated, optimised modules. The approach is based on dedicated functionality and the extensive use of energy reduction techniques at all levels of system design. The system has an architecture with a general-purpose processor accompanied by a set of heterogeneous autonomous programmable modules, each providing an energy efficient implementation of dedicated tasks. A reconfigurable internal communication network switch exploits locality of reference and eliminates wasteful data copies

CiteSeerX

University of Twente Research Information

Synthesis of application specific processor architectures for ultra-low energy consumption

Author: Kazmierski T J
Leech Charles
Publication venue
Publication date: 12/02/2014
Field of study

In this paper we suggest that further energy savings can be achieved by a new approach to synthesis of embedded processor cores, where the architecture is tailored to the algorithms that the core executes. In the context of embedded processor synthesis, both single-core and many-core, the types of algorithms and demands on the execution efficiency are usually known at the chip design time. This knowledge can be utilised at the design stage to synthesise architectures optimised for energy consumption. Firstly, we present an overview of both traditional energy saving techniques and new developments in architectural approaches to energy-efficient processing. Secondly, we propose a picoMIPS architecture that serves as an architectural template for energy-efficient synthesis. As a case study, we show how the picoMIPS architecture can be tailored to an energy efficient execution of the DCT algorithm

Southampton (e-Prints Soton)

Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System

Author: Kinsy Michel
Pellauer Michael
Publication venue
Publication date: 08/12/2010
Field of study

Heracles is an open-source complete multicore system written in Verilog. It is fully parameterized and can be reconfigured and synthesized into different topologies and sizes. Each processing node has a 7-stage pipeline, fully bypassed, microprocessor running the MIPS-III ISA, a 4-stage input-buffer, virtual-channel router, and a local variable-size shared memory. Our design is highly modular with clear interfaces between the core, the memory hierarchy, and the on-chip network. In the baseline design, the microprocessor is attached to two caches, one instruction cache and one data cache, which are oblivious to the global memory organization. The memory system in Heracles can be configured as one single global shared memory (SM), or distributed shared memory (DSM), or any combination thereof. Each core is connected to the rest of the network of processors by a parameterized, realistic, wormhole router. We show different topology configurations of the system, and their synthesis results on the Xilinx Virtex-5 LX330T FPGA board. We also provide a small MIPS cross-compiler toolchain to assist in developing software for Heracles

DSpace@MIT

Implementation of soft processor based SOC for JPEG compression on FPGA

Author: Raju Y. David Solomon
Swarna K.S.V.
Publication venue: 'ICT Academy'
Publication date: 01/02/2015
Field of study

With the advent of semiconductor process and EDA tools technology, IC designers can integrate more functions. However, to reduce the demand of time-to-market and tackle the increasing complexity of SoC, the need of fast prototyping and testing is growing. Taking advantage of deep submicron technology, modern FPGAs provide a fast and low-cost prototyping with large logic resources and high performance. So the hardware is mapped onto an emulation platform based on FPGA that mimics the behaviour of SOC. In this paper we use FPGA as a system on chip which is then used for image compression by 2-D DCT respectively and proposed SoC for image compression using soft core Microblaze. The JPEG standard defines compression techniques for image data. As a consequence, it allows to store and transfer image data with considerably reduced demand for storage space and bandwidth. From the four processes provided in the JPEG standard, only one, the baseline process is widely used. Proposed SoC for JPEG compression has been implemented on FPGA Spartan-6 SP605 evaluation board using Xilinx platform studio, because field programmable gate array have reconfigurable hardware architecture. Hence the JPEG image with high speed and reduced size can be obtained at low risk and low power consumption of about 0.699W. The proposed SoC for image compression is evaluated at 83.33MHz on Xilinx Spartan-6 FPGA

Deakin Research Online