47 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Spiker+: a framework for the generation of efficient Spiking Neural Networks FPGA accelerators for inference at the edge

    Full text link
    Including Artificial Neural Networks in embedded systems at the edge allows applications to exploit Artificial Intelligence capabilities directly within devices operating at the network periphery. This paper introduces Spiker+, a comprehensive framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge. Spiker+ presents a configurable multi-layer hardware SNN, a library of highly efficient neuron architectures, and a design framework, enabling the development of complex neural network accelerators with few lines of Python code. Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD). On the MNIST, it demonstrates competitive performance compared to state-of-the-art SNN accelerators. It outperforms them in terms of resource allocation, with a requirement of 7,612 logic cells and 18 Block RAMs (BRAMs), which makes it fit in very small FPGA, and power consumption, draining only 180mW for a complete inference on an input image. The latency is comparable to the ones observed in the state-of-the-art, with 780us/img. To the authors' knowledge, Spiker+ is the first SNN accelerator tested on the SHD. In this case, the accelerator requires 18,268 logic cells and 51 BRAM, with an overall power consumption of 430mW and a latency of 54 us for a complete inference on input data. This underscores the significance of Spiker+ in the hardware-accelerated SNN landscape, making it an excellent solution to deploy configurable and tunable SNN architectures in resource and power-constrained edge applications

    Power Management for Deep Submicron Microprocessors

    Get PDF
    As VLSI technology scales, the enhanced performance of smaller transistors comes at the expense of increased power consumption. In addition to the dynamic power consumed by the circuits there is a tremendous increase in the leakage power consumption which is further exacerbated by the increasing operating temperatures. The total power consumption of modern processors is distributed between the processor core, memory and interconnects. In this research two novel power management techniques are presented targeting the functional units and the global interconnects. First, since most leakage control schemes for processor functional units are based on circuit level techniques, such schemes inherently lack information about the operational profile of higher-level components of the system. This is a barrier to the pivotal task of predicting standby time. Without this prediction, it is extremely difficult to assess the value of any leakage control scheme. Consequently, a methodology that can predict the standby time is highly beneficial in bridging the gap between the information available at the application level and the circuit implementations. In this work, a novel Dynamic Sleep Signal Generator (DSSG) is presented. It utilizes the usage traces extracted from cycle accurate simulations of benchmark programs to predict the long standby periods associated with the various functional units. The DSSG bases its decisions on the current and previous standby state of the functional units to accurately predict the length of the next standby period. The DSSG presents an alternative to Static Sleep Signal Generation (SSSG) based on static counters that trigger the generation of the sleep signal when the functional units idle for a prespecified number of cycles. The test results of the DSSG are obtained by the use of a modified RISC superscalar processor, implemented by SimpleScalar, the most widely accepted open source vehicle for architectural analysis. In addition, the results are further verified by a Simultaneous Multithreading simulator implemented by SMTSIM. Leakage saving results shows an increase of up to 146% in leakage savings using the DSSG versus the SSSG, with an accuracy of 60-80% for predicting long standby periods. Second, chip designers in their effort to achieve timing closure, have focused on achieving the lowest possible interconnect delay through buffer insertion and routing techniques. This approach, though, taxes the power budget of modern ICs, especially those intended for wireless applications. Also, in order to achieve more functionality, die sizes are constantly increasing. This trend is leading to an increase in the average global interconnect length which, in turn, requires more buffers to achieve timing closure. Unconstrained buffering is bound to adversely affect the overall chip performance, if the power consumption is added as a major performance metric. In fact, the number of global interconnect buffers is expected to reach hundreds of thousands to achieve an appropriate timing closure. To mitigate the impact of the power consumed by the interconnect buffers, a power-efficient multi-pin routing technique is proposed in this research. The problem is based on a graph representation of the routing possibilities, including buffer insertion and identifying the least power path between the interconnect source and set of sinks. The novel multi-pin routing technique is tested by applying it to the ISPD and IBM benchmarks to verify the accuracy, complexity, and solution quality. Results obtained indicate that an average power savings as high as 32% for the 130-nm technology is achieved with no impact on the maximum chip frequency

    Hardware acceleration for power efficient deep packet inspection

    Get PDF
    The rapid growth of the Internet leads to a massive spread of malicious attacks like viruses and malwares, making the safety of online activity a major concern. The use of Network Intrusion Detection Systems (NIDS) is an effective method to safeguard the Internet. One key procedure in NIDS is Deep Packet Inspection (DPI). DPI can examine the contents of a packet and take actions on the packets based on predefined rules. In this thesis, DPI is mainly discussed in the context of security applications. However, DPI can also be used for bandwidth management and network surveillance. DPI inspects the whole packet payload, and due to this and the complexity of the inspection rules, DPI algorithms consume significant amounts of resources including time, memory and energy. The aim of this thesis is to design hardware accelerated methods for memory and energy efficient high-speed DPI. The patterns in packet payloads, especially complex patterns, can be efficiently represented by regular expressions, which can be translated by the use of Deterministic Finite Automata (DFA). DFA algorithms are fast but consume very large amounts of memory with certain kinds of regular expressions. In this thesis, memory efficient algorithms are proposed based on the transition compressions of the DFAs. In this work, Bloom filters are used to implement DPI on an FPGA for hardware acceleration with the design of a parallel architecture. Furthermore, devoted at a balance of power and performance, an energy efficient adaptive Bloom filter is designed with the capability of adjusting the number of active hash functions according to current workload. In addition, a method is given for implementation on both two-stage and multi-stage platforms. Nevertheless, false positive rates still prevents the Bloom filter from extensive utilization; a cache-based counting Bloom filter is presented in this work to get rid of the false positives for fast and precise matching. Finally, in future work, in order to estimate the effect of power savings, models will be built for routers and DPI, which will also analyze the latency impact of dynamic frequency adaption to current traffic. Besides, a low power DPI system will be designed with a single or multiple DPI engines. Results and evaluation of the low power DPI model and system will be produced in future

    Energy and quality scalable wireless communication

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (p. 165-171).Nodes for emerging, high-density wireless networks will face the dual challenges of continuous, multi-year operation under diverse and challenging operating conditions. The wireless communication subsystem, a substantial consumer of energy, must therefore be designed with unprecedented energy efficiency. To meet this challenge, inefficiencies once overlooked must be addressed, and the system must be designed for energy scalability, the use of graceful energy vs. quality trade-offs in response to continuous variations in operational conditions. Using a comprehensive model framework that unifies cross-disciplinary models for energy consumption and communication performance, this work explores multi-dimensional trade-offs of energy and quality for wireless communication at all levels of the system hierarchy. The circuit-level "knob" of dynamic voltage scaling is implemented on a commercial microprocessor and integrated into a power aware, prototype microsensor node. Power aware abstractions encourage collaboration between the hardware, which fundamentally dissipates the energy, and software, which controls how the hardware behaves. Accurate models of hardware energy consumption reveal inefficiencies of routing techniques such as multihop, and the models are fused with information-theoretic limits on code performance to bound the energy scalability of the hardware platform. An application-specific protocol for microsensor networks is evaluated with a new, interactive Java simulation tool created expressly for energy-conscious, high density wireless networks. Close collaboration between software and hardware layers, and across the research disciplines that compose wireless communication itself, are crucial enablers for energy-efficient wireless communication.by Rex Kee Min.Ph.D

    Unreliable Silicon: Circuit through System-Level Techniques for Mitigating the Adverse Effects of Process Variation, Device Degradation and Environmental Conditions.

    Full text link
    Designing and manufacturing integrated circuits in advanced, highly-scaled processing technologies that meet stringent specification sets is an increasingly unreliable proposition. Dimensional processing variations, time and stress dependent device degradation and potentially varying environmental conditions exacerbate deviations in performance, power and even functionality of integrated circuits. This work explores a system-level adaptive design philosophy intended to mitigate the power and performance impact of unreliable silicon devices and presents enabling circuits for SRAM variation mitigation and in-situ measurement of device degradation in 130nm and 45nm processing technologies. An adaptation of RAZOR-based DVS designed for on-chip memory power reduction and reliability lifetime improvement enables the elimination of 250 mV of voltage margin in a 1.8V design, with up to 500 mV of reduction when allowing 5% of memory operations to use multiple cycles. A novel PID-controlled dynamic reliability management (DRM) system is presented, allowing user-specified circuit lifetime to be dynamically managed via dynamic voltage and frequency scaling. Peak performance improvement of 20-35% is achievable in typical processing systems by allowing brief periods of elevated voltage operation through the real-time DRM system, while minimizing voltage during non-critical periods of operation to maximize circuit lifetime. A probabilistic analysis of oxide breakdown using the percolation model indicates the need for 1000-2000 integrated in-situ sensors to achieve oxide lifetime prediction error at or under 10%. The conclusions from the oxide analysis are used to guide the design of a series of novel on-chip reliability monitoring circuits for use in a real-time DRM system. A 130nm in-situ oxide breakdown measurement sensor presented is the first published design of an oxide-breakdown oriented circuit and is compatible with standard-cell style automatic “place and route” design styles used in the majority of application specific integrated circuit designs. Measured results show increases in gate oxide leakage of 14-35% after accelerated stress testing. A second generation design of the on-chip oxide degradation sensor is presented that reduces stress mode power consumption by 111,785X over the initial design while providing an ideal 1:1 mapping of gate leakage to output frequency in extracted simulations.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/60701/1/ekarl_1.pd

    Non-invasive power gating techniques for bursty computation workloads using micro-electro-mechanical relays

    Get PDF
    PhD ThesisElectrostatically-actuated Micro-Electro-Mechanical/Nano-Electro- Mechanical (MEM/NEM) relays are promising devices overcoming the energy-efficiency limitations of CMOS transistors. Many exploratory research projects are currently under way investigating the mechanical, electrical and logical characteristics of MEM/NEM relays. One particular issue that this work addresses is the need for a scalable and accurate physical model of the MEM/NEM switches that can be plugged into the standard EDA software. The existing models are accurate and detailed but they suffer from the convergence problem. This problem requires finding ad-hoc workarounds and significantly impacts the designer’s productivity. In this thesis we propose a new simplified Verilog-AMS model. To test scalability of the proposed model we cross-checked it against our analysis of a range of benchmark circuits. Results show that, compared to standard models, the proposed model is sufficiently accurate with an average of 6% error and can handle larger designs without divergence. This thesis also investigates the modelling, designing and optimization of various MEM/NEM switches using 3D Finite Element Analysis (FEA) performed by the COMSOL multiphysics simulation tool. An extensive parametric sweep simulation is performed to study the energy-latency trade-offs of MEM/NEM relays. To accurately simulate MEMS/NEMS-based digital circuits, a Verilog-AMS model is proposed based on the evaluated parameters obtained from the multiphysics simulation tool. This allows an accurate calibration of the MEM/NEM relays with a significant reduction in simulation speed compared to that of 3D FEA exercised on COMSOL tool. The effectiveness of two power gating approaches in asynchronous micropipelines is also investigated using MEM/NEM switches and sleep transistors in reducing idle power dissipation with a particular target throughput. Sleep transistors are traditionally used to power gate idle circuits, however, these transistors have fundamental limitations in their effectiveness. Alternatively, MEM/NEM relays with zero leakage current can achieve greater energy savings under a certain data rate and design architecture. An asynchronous FIR filter 4 phase bundled data handshake protocol is presented. Implementation is accomplished in 90nm technology node and simulation exercised at various data rates and design complexities. It was demonstrated that our proposed approach offers 69% energy improvements at a data rate 1KHz compared to 39% of the previous work. The current trends for greater heterogeneity in future Systems-on- Chip (SoC) do not only concern their functionality but also their timing and power aspects. The increasing diversity of timing and power supply conditions, and associated concurrently operating modes, within an SoC calls for more efficient power delivery networks (PDN) for battery operated devices. This is especially important for systems with mixed duty cycling, where some parts are required to work regularly with low-throughput while other parts are activated spontaneously, i.e. in bursts. To improve their reaction time vs energy efficiency, this work proposes to incorporate a power-switching network based on MEM relays to switch the SoC power-performance state (PPS) into an active mode while eliminating the leakage current when it is idle. Results show that even with today0s large and high pull-in voltages, a MEM-relay-based power switching network (PSN) can achieve a 1000x savings in energy compared to its CMOS counterpart for low duty cycle. A simple case of optimising an on-chip charge pump required to switch-on the relay has been investigated and its energy-latency overhead has been evaluated. Heterogeneous many-core systems are increasingly being employed in modern embedded platforms for high throughput at low energy cost considerations. These applications typically exhibit bursty workloads that provide opportunities to minimize system energy. CMOS-based power gating circuitry, typically consisting of sleep transistors, is used as an effective technique for idle energy reduction in such applications. However, these transistors contribute high leakage current when driving large capacitive loads, making effective energy minimization challenging. This thesis proposes a novel MEMS-based idle energy control approach. Core to this approach is an integrated sleep mode management based on the performance-energy states and bursty workloads indicated by the performance counters. A number of PARSEC benchmark applications are used as case studies of bursty workloads, including CPU- and memory- intensive ones. These applications are exercised on an Exynos 5422 heterogeneous many-core platform, engineered with a performance counter facilities, showing 55.5% energy savings compared with an on-demand governor. Furthermore, an extensive trade-off analysis demonstrates the comparative advantages of the MEMS-based controller, including zero-leakage current and non-invasive implementations suitable for commercial off-the-shelf systems.Higher committee of education development in Iraq (HCED

    HPCCP/CAS Workshop Proceedings 1998

    Get PDF
    This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

    A Cross-level Verification Methodology for Digital IPs Augmented with Embedded Timing Monitors

    Get PDF
    Smart systems are characterized by the integration in a single device of multi-domain subsystems of different technological domains, namely, analog, digital, discrete and power devices, MEMS, and power sources. Such challenges, emerging from the heterogeneous nature of the whole system, combined with the traditional challenges of digital design, directly impact on performance and on propagation delay of digital components. This article proposes a design approach to enhance the RTL model of a given digital component for the integration in smart systems with the automatic insertion of delay sensors, which can detect and correct timing failures. The article then proposes a methodology to verify such added features at system level. The augmented model is abstracted to SystemC TLM, which is automatically injected with mutants (i.e., code mutations) to emulate delays and timing failures. The resulting TLM model is finally simulated to identify timing failures and to verify the correctness of the inserted delay monitors. Experimental results demonstrate the applicability of the proposed design and verification methodology, thanks to an efficient sensor-aware abstraction methodology, by applying the flow to three complex case studies
    corecore