114 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Using Partial Reconfiguration for SoC Design and Implementation

    Get PDF
    Most reconfigurable systems rely on FPGA technology. Among these ones, those which permit dynamic and partial reconfiguration, offer added benefits in flexibility, in-field device upgrade, improved design and manufacturing time, and even, in some cases, power consumption reductions. However, dynamic reconfiguration is a complex task, and the real benefits of its use in real applications have been often questioned. This paper presents an overview of the partial reconfiguration technique application, along with four original applications. The main goal of these applications is to test several architectures with different flexibility and, to search for the partial reconfiguration "killing application", that is, the application that better demonstrates the benefits of today reconfigurable systems based on commercial FPGAs. Therefore, the presented applications are rather a proof of concept, than fully operative and closed systems. First, a brief introduction to the partial reconfigurable systems application topic has been included. After that, the descriptions of the created reconfigurable systems are presented: first, an on-chip communications emulation framework, second, an on chip debugging system, third, a wireless sensor network reconfigurable node and finally, a remote reconfigurable client-server device. Each application is described in a separate section of the paper along with some test and results. General conclusions are included at the end of the pape

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF

    Analysis of a Pipelined Architecture for Sparse DNNs on Embedded Systems

    Get PDF
    Deep neural networks (DNNs) are increasing their presence in a wide range of applications, and their computationally intensive and memory-demanding nature poses challenges, especially for embedded systems. Pruning techniques turn DNN models into sparse by setting most weights to zero, offering optimization opportunities if specific support is included. We propose a novel pipelined architecture for DNNs that avoids all useless operations during the inference process. It has been implemented in a field-programmable gate array (FPGA), and the performance, energy efficiency, and area have been characterized. Exploiting sparsity yields remarkable speedups but also produces area overheads. We have evaluated this tradeoff in order to identify in which scenarios it is better to use that area to exploit sparsity, or to include more computational resources in a conventional DNN architecture. We have also explored different arithmetic bitwidths. Our sparse architecture is clearly superior on 32-bit arithmetic or highly sparse networks. However, on 8-bit arithmetic or networks with low sparsity it is more profitable to deploy a dense architecture with more arithmetic resources than including support for sparsity. We consider that FPGAs are the natural target for DNN sparse accelerators since they can be loaded at run-time with the best-fitting accelerator

    A Hybrid Hardware/Software Architecture That Combines a 4-wide Very Long Instruction Word Software Processor (VLIW) with Application-specific Super-complex Instruction Set Hardware Functions

    Get PDF
    Application-driven processor designs are becoming increasingly feasible. Today, advances in field-programmable gate array (FPGA) technology are opening the doors to fast and highly-feasible hardware/software co-designed architectures. Over 100,000 FPGA logic array blocks and nearly 100 ASIC multiply-accumulate cores combine with extensible CPU cores to foster the design of configurable, application-driven hybrid processors.This thesis proposes a hardware/software co-designed architecture targeted to an FPGA. The architecture is a very-long instruction-word (VLIW) processor coupled with super-complex instruction set (SuperCISC) hardware co-processors. Results of the VLIW/SuperCISC show performance speedups over a single-issue processor of 9x to 332x, and entire application speedups from 4x to 127x. Contributions of this research include a 4-way VLIW designed from the ground up, a zero-overhead implementation of a hardware/software interface, evaluation of the scalability of shared data stores, examples of application-specific hardware accelerants, a SystemC simulator, and an evaluation of shared memory configurations

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    Application-Specific Number Representation

    No full text
    Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations
    • …
    corecore