699 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationThe embedded system space is characterized by a rapid evolution in the complexity and functionality of applications. In addition, the short time-to-market nature of the business motivates the use of programmable devices capable of meeting the conflicting constraints of low-energy, high-performance, and short design times. The keys to achieving these conflicting constraints are specialization and maximally extracting available application parallelism. General purpose processors are flexible but are either too power hungry or lack the necessary performance. Application-specific integrated circuits (ASICS) efficiently meet the performance and power needs but are inflexible. Programmable domain-specific architectures (DSAs) are an attractive middle ground, but their design requires significant time, resources, and expertise in a variety of specialties, which range from application algorithms to architecture and ultimately, circuit design. This dissertation presents CoGenE, a design framework that automates the design of energy-performance-optimal DSAs for embedded systems. For a given application domain and a user-chosen initial architectural specification, CoGenE consists of a a Compiler to generate execution binary, a simulator Generator to collect performance/energy statistics, and an Explorer that modifies the current architecture to improve energy-performance-area characteristics. The above process repeats automatically until the user-specified constraints are achieved. This removes or alleviates the time needed to understand the application, manually design the DSA, and generate object code for the DSA. Thus, CoGenE is a new design methodology that represents a significant improvement in performance, energy dissipation, design time, and resources. This dissertation employs the face recognition domain to showcase a flexible architectural design methodology that creates "ASIC-like" DSAs. The DSAs are instruction set architecture (ISA)-independent and achieve good energy-performance characteristics by coscheduling the often conflicting constraints of data access, data movement, and computation through a flexible interconnect. This represents a significant increase in programming complexity and code generation time. To address this problem, the CoGenE compiler employs integer linear programming (ILP)-based 'interconnect-aware' scheduling techniques for automatic code generation. The CoGenE explorer employs an iterative technique to search the complete design space and select a set of energy-performance-optimal candidates. When compared to manual designs, results demonstrate that CoGenE produces superior designs for three application domains: face recognition, speech recognition and wireless telephony. While CoGenE is well suited to applications that exhibit a streaming behavior, multithreaded applications like ray tracing present a different but important challenge. To demonstrate its generality, CoGenE is evaluated in designing a novel multicore N-wide SIMD architecture, known as StreamRay, for the ray tracing domain. CoGenE is used to synthesize the SIMD execution cores, the compiler that generates the application binary, and the interconnection subsystem. Further, separating address and data computations in space reduces data movement and contention for resources, thereby significantly improving performance compared to existing ray tracing approaches

    Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

    Full text link
    Binary Neural Networks (BNNs) have been shown to be robust to random bit-level noise, making aggressive voltage scaling attractive as a power-saving technique for both logic and SRAMs. In this work, we introduce the first fully programmable IoT end-node system-on-chip (SoC) capable of executing software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC exploits a hybrid memory scheme where error-vulnerable SRAMs are complemented by reliable standard-cell memories to safely store critical data under aggressive voltage scaling. On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0.5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2.2X w.r.t. nominal conditions. Furthermore, we show that the supply voltage can be dropped to 0.42V (50% of nominal) while keeping more than99% of the nominal accuracy (with a bit error rate ~1/1000). In this operating point, our prototype performs 4Gop/s (15.4Inference/s on the CIFAR-10 dataset) by computing up to 13binary ops per pJ, achieving 22.8 Inference/s/mW while keeping within a peak power envelope of 674uW - low enough to enable always-on operation in ultra-low power smart cameras, long-lifetime environmental sensors, and insect-sized pico-drones.Comment: Submitted to ISICAS2020 journal special issu

    FPGA‐SRAM Soft Error Radiation Hardening

    Get PDF
    Due to integrated circuit technology scaling, a type of radiation effects called single event upsets (SEUs) has become a major concern for static random access memories (SRAMs) and thus for SRAM‐based field programmable gate arrays (FPGAs). These radiation effects are characterized by altering data stored in SRAM cells without permanently damaging them. However, SEUs can lead to unpredictable behavior in SRAM‐based FPGAs. A new hardening technique compatible with the current FPGA design workflows is presented. The technique works at the cell design level, and it is based on the modulation of cell transistor channel width. Experimental results show that to properly harden an SRAM cell, only some transistors have to be increased in size, while others need to be minimum sized. So, with this technique, area can be used in the most efficient way to harden SRAMs against radiation. Experimental results on a 65‐nm complementary metal‐oxide‐semiconductor (CMOS) SRAM demonstrate that the number of SEU events can be roughly reduced to 50% with adequate transitory sizing, while area is kept constant or slightly increased

    A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

    Full text link
    Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure

    Standby Supply Voltage Minimization for Reliable Nanoscale SRAMs

    Get PDF

    Low-power digital processor for wireless sensor networks

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 69-72).In order to make sensor networks cost-effective and practical, the electronic components of a wireless sensor node need to run for months to years on the same battery. This thesis explores the design of a low-power digital processor for these sensor nodes, employing techniques such as hardwired algorithms, lowered supply voltages, clock gating and subsystem shutdown. Prototypes were built on both a FPGA and ASIC platform, in order to verify functionality and characterize power consumption. The resulting 0.18[micro]m silicon fabricated in National Semiconductor Corporation's process was operational for supply voltages ranging from 0.5V to 1.8V. At the lowest operating voltage of 0.5V and a frequency of 100KHz, the chip performs 8 full-accuracy FFT computations per second and draws 1.2nJ of total energy per cycle. Although this energy/cycle metric does not surpass existing low-energy processors demonstrated in literature or commercial products, several low-power techniques are suggested that could drastically improve the energy metrics of a future implementation.by Daniel Frederic Finchelstein.S.M

    Improved Generation of Identifiers, Secret Keys, and Random Numbers From SRAMs

    Get PDF
    This paper presents a method to simultaneously improve the quality of the identifiers, secret keys, and random numbers that can be generated from the start-up values of standard static random access memories (SRAMs). The method is based on classifying memory cells after evaluating their start-up values at multiple measurements in a registration phase. The registration can be done without unplugging the device from its application context, and with no need for a complex laboratory setup. The method has been validated experimentally with standard low-power SRAM modules in two different application specific integrated circuits (ASICs) fabricated with the 90-nm TSMC technology. The results show that with a simple registration the length of the identifiers can be reduced by 45%, the worst case bit error probability (which defines the complexity of the error correcting code needed to recover a secret key) can be reduced by 64%, and the worst case minimum entropy value is improved, thus reducing the number of bits that have to be processed to obtain full entropy by 81%. The method can be applied to standard digital designs by controlling the external power supply to the SRAM using software or by incorporating simple circuitry in the design. In the latter case, a module for implementing the method in an ASIC designed in the 90-nm TSMC technology occupies an active area of 42, $025~mu text{m}^{mathrm {mathbf {2}}}

    Voltage stacking for near/sub-threshold operation

    Get PDF
    • 

    corecore