11 research outputs found

    Exploiting Natural On-chip Redundancy for Energy Efficient Memory and Computing

    Get PDF
    Power density is currently the primary design constraint across most computing segments and the main performance limiting factor. For years, industry has kept power density constant, while increasing frequency, lowering transistors supply (Vdd) and threshold (Vth) voltages. However, Vth scaling has stopped because leakage current is exponentially related to it. Transistor count and integration density keep doubling every process generation (Moore’s Law), but the power budget caps the amount of hardware that can be active at the same time, leading to dark silicon. With each new generation, there are more resources available, but we cannot fully exploit their performance potential. In the last years, different research trends have explored how to cope with dark silicon and unlock the energy efficiency of the chips, including Near-Threshold voltage Computing (NTC) and approximate computing. NTC aggressively lowers Vdd to values near Vth. This allows a substantial reduction in power, as dynamic power scales quadratically with supply voltage. The resultant power reduction could be used to activate more chip resources and potentially achieve performance improvements. Unfortunately, Vdd scaling is limited by the tight functionality margins of on-chip SRAM transistors. When scaling Vdd down to values near-threshold, manufacture-induced parameter variations affect the functionality of SRAM cells, which eventually become not reliable. A large amount of emerging applications, on the other hand, features an intrinsic error-resilience property, tolerating a certain amount of noise. In this context, approximate computing takes advantage of this observation and exploits the gap between the level of accuracy required by the application and the level of accuracy given by the computation, providing that reducing the accuracy translates into an energy gain. However, deciding which instructions and data and which techniques are best suited for approximation still poses a major challenge. This dissertation contributes in these two directions. First, it proposes a new approach to mitigate the impact of SRAM failures due to parameter variation for effective operation at ultra-low voltages. We identify two levels of natural on-chip redundancy: cache level and content level. The first arises because of the replication of blocks in multi-level cache hierarchies. We exploit this redundancy with a cache management policy that allocates blocks to entries taking into account the nature of the cache entry and the use pattern of the block. This policy obtains performance improvements between 2% and 34%, with respect to block disabling, a technique with similar complexity, incurring no additional storage overhead. The latter (content level redundancy) arises because of the redundancy of data in real world applications. We exploit this redundancy compressing cache blocks to fit them in partially functional cache entries. At the cost of a slight overhead increase, we can obtain performance within 2% of that obtained when the cache is built with fault-free cells, even if more than 90% of the cache entries have at least a faulty cell. Then, we analyze how the intrinsic noise tolerance of emerging applications can be exploited to design an approximate Instruction Set Architecture (ISA). Exploiting the ISA redundancy, we explore a set of techniques to approximate the execution of instructions across a set of emerging applications, pointing out the potential of reducing the complexity of the ISA, and the trade-offs of the approach. In a proof-of-concept implementation, the ISA is shrunk in two dimensions: Breadth (i.e., simplifying instructions) and Depth (i.e., dropping instructions). This proof-of-concept shows that energy can be reduced on average 20.6% at around 14.9% accuracy loss

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Improving the Reliability of Microprocessors under BTI and TDDB Degradations

    Get PDF
    Reliability is a fundamental challenge for current and future microprocessors with advanced nanoscale technologies. With smaller gates, thinner dielectric and higher temperature microprocessors are vulnerable under aging mechanisms such as Bias Temperature Instability (BTI) and Temperature Dependent Dielectric Breakdown (TDDB). Under continuous stress both parametric and functional errors occur, resulting compromised microprocessor lifetime. In this thesis, based on the thorough study on BTI and TDDB mechanisms, solutions are proposed to mitigating the aging processes on memory based and random logic structures in modern out-of-order microprocessors. A large area of processor core is occupied by memory based structure that is vulnerable to BTI induced errors. The problem is exacerbated when PBTI degradation in NMOS is as severe as NBTI in PMOS in high-k metal gate technology. Hence a novel design is proposed to recover 4 internal gates within a SRAM cell simultaneously to mitigate both NBTI and PBTI effects. This technique is applied to both the L2 cache banks and the busy function units with storage cells in out-of-order pipeline in two different ways. For the L2 cache banks, redundant cache bank is added exclusively for proactive recovery rotation. For the critical and busy function units in out-of-order pipelines, idle cycles are exploited at per-buffer-entry level. Different from memory based structures, combinational logic structures such as function units in execution stage can not use low overhead redundancy to tolerate errors due to their irregular structure. A design framework that aims to improve the reliability of the vulnerable functional units of a processor core is designed and implemented. The approach is designing a generic function unit (GFU) that can be reconfigured to replace a particular functional unit (FU) while it is being recovered for improved lifetime. Although flexible, the GFU is slower than the original target FUs. So GFU is carefully designed so as to minimize the performance loss when it is in-use. More schemes are also designed to avoid using the GFU on performance critical paths of a program execution

    Low energy digital circuit design using sub-threshold operation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.Includes bibliographical references (p. 189-202).Scaling of process technologies to deep sub-micron dimensions has made power management a significant concern for circuit designers. For emerging low power applications such as distributed micro-sensor networks or medical applications, low energy operation is the primary concern instead of speed, with the eventual goal of harvesting energy from the environment. Sub-threshold operation offers a promising solution for ultra-low-energy applications because it often achieves the minimum energy per operation. While initial explorations into sub-threshold circuits demonstrate its promise, sub-threshold circuit design remains in its infancy. This thesis makes several contributions that make sub-threshold design more accessible to circuit designers. First, a model for energy consumption in sub-threshold provides an analytical solution for the optimum VDD to minimize energy. Fitting this model to a generic circuit allows easy estimation of the impact of processing and environmental parameters on the minimum energy point. Second, analysis of device sizing for sub-threshold circuits shows the trade-offs between sizing for minimum energy and for minimum voltage operation.(cont.) A programmable FIR filter test chip fabricated in 0.18pum bulk CMOS provides measurements to confirm the model and the sizing analysis. Third, a low-overhead method for integrating sub-threshold operation with high performance applications extends dynamic voltage scaling across orders of magnitude of frequency and provides energy scalability down to the minimum energy point. A 90nm bulk CMOS test chip confirms the range of operation for ultra-dynamic voltage scaling. Finally, sub-threshold operation is extended to memories. Analysis of traditional SRAM bitcells and architectures leads to development of a new bitcell for robust sub-threshold SRAM operation. The sub-threshold SRAM is analyzed experimentally in a 65nm bulk CMOS test chip.by Benton H. Calhoun.Ph.D

    An ultra-low voltage FFT processor using energy-aware techniques

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2004.Page 170 blank.Includes bibliographical references (p. 165-169).In a number of emerging applications such as wireless sensor networks, system lifetime depends on the energy efficiency of computation and communication. The key metric in such applications is the energy dissipated per function rather than traditional ones such as clock speed or silicon area. Hardware designs are shifting focus toward enabling energy-awareness, allowing the processor to be energy-efficient for a variety of operating scenarios. This is in contrast to conventional low-power design, which optimizes for the worst-case scenario. Here, three energy-quality scalable hooks are designed into a real-valued FFT processor: variable FFT length (N=128 to 1024 points), variable bit precision (8,16 bit), and variable voltage supply with variable clock frequency (VDD=1 80mV to 0.9V, and f=164Hz to 6MHz). A variable-bit-precision and variable-FFT-length scalable FFT ASIC using an off-the-shelf standard-cell logic library and memory only scales down to 1V operation. Further energy savings is achieved through ultra-low voltage-supply operation. As performance requirements are relaxed, the operating voltage supply is scaled down, possibly even below the threshold voltage into the subthreshold region. When lower frequencies cause leakage energy dissipation to exceed the active energy dissipation, there is an optimal operating point for minimizing energy consumption.(cont.) Logic and memory design techniques allowing ultra-low voltage operation are employed to study the optimal frequency/voltage operating point for the FFT. A full-custom implementation with circuit techniques optimized for deep voltage scaling into the subthreshold regime, is fabricated using a standard CMOS 0.18[mu]m logic process and functions down to 180mV. At the optimal operating point where the voltage supply is 350mV, the FFT processor dissipates 155nJ/FFT. The custom FFT is 8x more energy-efficient than the ASIC implementation and 350x more energy-efficient than a low-power microprocessor implementation.by Alice Wang.Ph.D

    SpiNNaker - A Spiking Neural Network Architecture

    Get PDF
    20 years in conception and 15 in construction, the SpiNNaker project has delivered the world’s largest neuromorphic computing platform incorporating over a million ARM mobile phone processors and capable of modelling spiking neural networks of the scale of a mouse brain in biological real time. This machine, hosted at the University of Manchester in the UK, is freely available under the auspices of the EU Flagship Human Brain Project. This book tells the story of the origins of the machine, its development and its deployment, and the immense software development effort that has gone into making it openly available and accessible to researchers and students the world over. It also presents exemplar applications from ‘Talk’, a SpiNNaker-controlled robotic exhibit at the Manchester Art Gallery as part of ‘The Imitation Game’, a set of works commissioned in 2016 in honour of Alan Turing, through to a way to solve hard computing problems using stochastic neural networks. The book concludes with a look to the future, and the SpiNNaker-2 machine which is yet to come

    SpiNNaker - A Spiking Neural Network Architecture

    Get PDF
    20 years in conception and 15 in construction, the SpiNNaker project has delivered the world’s largest neuromorphic computing platform incorporating over a million ARM mobile phone processors and capable of modelling spiking neural networks of the scale of a mouse brain in biological real time. This machine, hosted at the University of Manchester in the UK, is freely available under the auspices of the EU Flagship Human Brain Project. This book tells the story of the origins of the machine, its development and its deployment, and the immense software development effort that has gone into making it openly available and accessible to researchers and students the world over. It also presents exemplar applications from ‘Talk’, a SpiNNaker-controlled robotic exhibit at the Manchester Art Gallery as part of ‘The Imitation Game’, a set of works commissioned in 2016 in honour of Alan Turing, through to a way to solve hard computing problems using stochastic neural networks. The book concludes with a look to the future, and the SpiNNaker-2 machine which is yet to come

    Resistance switching devices based on amorphous insulator-metal thin films

    Get PDF
    Nanometallic devices based on amorphous insulator-metal thin films are developed to provide a novel non-volatile resistance-switching random-access memory (RRAM). In these devices, data recording is controlled by a bipolar voltage, which tunes electron localization length, thus resistivity, through electron trapping/detrapping. The low-resistance state is a metallic state while the high-resistance state is an insulating state, as established by conductivity studies from 2K to 300K. The material is exemplified by a Si3N4 thin film with randomly dispersed Pt or Cr. It has been extended to other materials, spanning a large library of oxide and nitride insulator films, dispersed with transition and main-group metal atoms. Nanometallic RRAMs have superior properties that set them apart from other RRAMs. The critical switching voltage is independent of the film thickness/device area/temperature/switching speed. Trapped electrons are relaxed by electron-phonon interaction, adding stability which enables long-term memory retention. As electron-phonon interaction is mechanically altered, trapped electron can be destabilized, and sub-picosecond switching has been demonstrated using an electromagnetically generated stress pulse. AC impedance spectroscopy confirms the resistance state is spatially uniform, providing a capacitance that linearly scales with area and inversely scales with thickness. The spatial uniformity is also manifested in outstanding uniformity of switching properties. Device degradation, due to moisture, electrode oxidation and dielectrophoresis, is minimal when dense thin films are used or when a hermetic seal is provided. The potential for low power operation, multi-bit storage and complementary stacking have been demonstrated in various RRAM configurations.Comment: 523 pages, 215 figures, 10 chapter

    Topical Workshop on Electronics for Particle Physics

    Get PDF
    The purpose of the workshop was to present results and original concepts for electronics research and development relevant to particle physics experiments as well as accelerator and beam instrumentation at future facilities; to review the status of electronics for the LHC experiments; to identify and encourage common efforts for the development of electronics; and to promote information exchange and collaboration in the relevant engineering and physics communities

    GSI Scientific Report 2011 [GSI Report 2012-1]

    Get PDF
    corecore