14 research outputs found

    CIRCUITS AND ARCHITECTURE FOR BIO-INSPIRED AI ACCELERATORS

    Get PDF
    Technological advances in microelectronics envisioned through Moore’s law have led to powerful processors that can handle complex and computationally intensive tasks. Nonetheless, these advancements through technology scaling have come at an unfavorable cost of significantly larger power consumption, which has posed challenges for data processing centers and computers at scale. Moreover, with the emergence of mobile computing platforms constrained by power and bandwidth for distributed computing, the necessity for more energy-efficient scalable local processing has become more significant. Unconventional Compute-in-Memory architectures such as the analog winner-takes-all associative-memory and the Charge-Injection Device processor have been proposed as alternatives. Unconventional charge-based computation has been employed for neural network accelerators in the past, where impressive energy efficiency per operation has been attained in 1-bit vector-vector multiplications, and in recent work, multi-bit vector-vector multiplications. In the latter, computation was carried out by counting quanta of charge at the thermal noise limit, using packets of about 1000 electrons. These systems are neither analog nor digital in the traditional sense but employ mixed-signal circuits to count the packets of charge and hence we call them Quasi-Digital. By amortizing the energy costs of the mixed-signal encoding/decoding over compute-vectors with many elements, high energy efficiencies can be achieved. In this dissertation, I present a design framework for AI accelerators using scalable compute-in-memory architectures. On the device level, two primitive elements are designed and characterized as target computational technologies: (i) a multilevel non-volatile cell and (ii) a pseudo Dynamic Random-Access Memory (pseudo-DRAM) bit-cell. At the level of circuit description, compute-in-memory crossbars and mixed-signal circuits were designed, allowing seamless connectivity to digital controllers. At the level of data representation, both binary and stochastic-unary coding are used to compute Vector-Vector Multiplications (VMMs) at the array level. Finally, on the architectural level, two AI accelerator for data-center processing and edge computing are discussed. Both designs are scalable multi-core Systems-on-Chip (SoCs), where vector-processor arrays are tiled on a 2-layer Network-on-Chip (NoC), enabling neighbor communication and flexible compute vs. memory trade-off. General purpose Arm/RISCV co-processors provide adequate bootstrapping and system-housekeeping and a high-speed interface fabric facilitates Input/Output to main memory

    Achieving Reliable and Sustainable Next-Generation Memories

    Get PDF
    Conventional memory technology scaling has introduced reliability challenges due to dysfunctional, improperly formed cells and crosstalk from increased cell proximity. Furthermore, as the manufacturing effort becomes increasingly complex due to these deeply scaled technologies, holistic sustainability is negatively impacted. The development of new memory technologies can help overcome the capacitor scaling limitations of DRAM. However, these technologies have their own reliability concerns, such as limited write endurance in the case of Phase Change Memories (PCM). Moreover, emerging system requirements, such as in-memory encryption to protect sensitive or private data and operation in harsh environments create additional challenges that must be addressed in the context of reliability and sustainability. This dissertation provides new multifactor and ultimately unified solutions to address many of these concerns in the same system. In particular, my contributions toward mitigating these issues are as follows. I present GreenChip and GreenAsic, which together provide the first tools to holistically evaluate new computer architecture, chip, and memory design concepts for sustainability. These tools provide detailed estimates of manufacturing and operational-phase metrics for different computing workloads and deployment scenarios. Using GreenChip, I examined existing DRAM reliability techniques in the context of their holistic sustainability impact, including my own technique to mitigate bitline crosstalk. For PCM, I provided a new reliability technique with no additional storage overhead that substantially increases the lifetime of an encrypted memory system. To provide bit-level error correction, I developed compact linked-list and Bloom-filter-based bit-level fault map structures, that provide unprecedented levels of error tabulation, combined with my own novel error correction and lifetime extension approaches based on these maps for less area than traditional ECC. In particular, FaME, can correct N faults using N bits when utilizing a bit-level fault map. For operation in harsh environments, I created a triple modular redundancy (TMR) pointer-based fault map, HOTH, which specifically protects cells shown to be weak to radiation. Finally, to combine the analyses of holistic sustainability and memory lifetime, I created the LARS technique, which adjusts the GreenChip indifference analysis to account for the additional sustainability benefit provided by increased reliability and lifetime

    COMPUTE-IN-MEMORY WITH EMERGING NON-VOLATILE MEMORIES FOR ACCELERATING DEEP NEURAL NETWORKS

    Get PDF
    The objective of this research is to accelerate deep neural networks (DNNs) with emerging non-volatile memories (eNVMs) based compute-in-memory (CIM) architecture. The research first focuses on the inference acceleration and proposes a resistive random access memory (RRAM) based CIM architecture. Two generations of RRAM testchips which monolithically integrate the RRAM memory array and CMOS peripheral circuits are designed and fabricated using Winbond 90 nm and TSMC 40 nm commercial embedded RRAM process respectively. The first generation of testchip named XNOR-RRAM is dedicated for binary neural networks (BNNs) and the second generation named Flex-RRAM features 1bit-to-8bit run-time configurable precision and leverages the input sparsity of the DNN model to improve the throughput and energy efficiency. However, the non-ideal characteristics of eNVM devices, especially when utilized as multi-level analog synaptic weights, may incur a notable accuracy degradation for both training and inference. This research develops a PyTorch based framework that incorporates the device characteristics into the DNN model to evaluate the impact of the eNVM nonidealities on training/inference accuracy. The results suggest that it is challenging to directly use eNVMs for in-situ training and resistance drift remains as a critical challenge to maintain a high inference accuracy. Furthermore, to overcome the challenges posed by the asymmetric conductance tuning behavior of typical eNVMs, which is found to be the most critical nonideality that prevents the model from achieving software equivalent training accuracy, this research proposes a novel 2-transistor-1-FeFET (ferroelectric field effect transistor) based synaptic weight cell that exploits hybrid precision for in situ training and inference, which achieves near-software classification accuracy on MNIST and CIFAR-10 dataset.Ph.D

    Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency

    Get PDF
    abstract: Static CMOS logic has remained the dominant design style of digital systems for more than four decades due to its robustness and near zero standby current. Static CMOS logic circuits consist of a network of combinational logic cells and clocked sequential elements, such as latches and flip-flops that are used for sequencing computations over time. The majority of the digital design techniques to reduce power, area, and leakage over the past four decades have focused almost entirely on optimizing the combinational logic. This work explores alternate architectures for the flip-flops for improving the overall circuit performance, power and area. It consists of three main sections. First, is the design of a multi-input configurable flip-flop structure with embedded logic. A conventional D-type flip-flop may be viewed as realizing an identity function, in which the output is simply the value of the input sampled at the clock edge. In contrast, the proposed multi-input flip-flop, named PNAND, can be configured to realize one of a family of Boolean functions called threshold functions. In essence, the PNAND is a circuit implementation of the well-known binary perceptron. Unlike other reconfigurable circuits, a PNAND can be configured by simply changing the assignment of signals to its inputs. Using a standard cell library of such gates, a technology mapping algorithm can be applied to transform a given netlist into one with an optimal mixture of conventional logic gates and threshold gates. This approach was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier in 65nm LP technology. Simulation and chip measurements show more than 30% improvement in dynamic power and more than 20% reduction in core area. The functional yield of the PNAND reduces with geometry and voltage scaling. The second part of this research investigates the use of two mechanisms to improve the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM devices for low voltage operation. The third part of this research focused on the design of flip-flops with non-volatile storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated with both conventional D-flipflop and the PNAND circuits to implement non-volatile logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of system locally when a power interruption occurs. However, manufacturing variations in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading to an overly pessimistic design and consequently, higher energy consumption. A detailed analysis of the design trade-offs in the driver circuitry for performing backup and restore, and a novel method to design the energy optimal driver for a given yield is presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented, in which the backup time is determined on a per-chip basis, resulting in minimizing the energy wastage and satisfying the yield constraint. To achieve a yield of 98%, the conventional approach would have to expend nearly 5X more energy than the minimum required, whereas the proposed tunable approach expends only 26% more energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are designed with the same backup and restore circuitry in 65nm technology. The embedded logic in NV-TLFF compensates performance overhead of NVL. This leads to the possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and- accumulate (MAC) unit is designed to demonstrate the performance benefits of the proposed architecture. Based on the results of HSPICE simulations, the MAC circuit with the proposed NV-TLFF cells is shown to consume at least 20% less power and area as compared to the circuit designed with conventional DFFs, without sacrificing any performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Solid State Circuits Technologies

    Get PDF
    The evolution of solid-state circuit technology has a long history within a relatively short period of time. This technology has lead to the modern information society that connects us and tools, a large market, and many types of products and applications. The solid-state circuit technology continuously evolves via breakthroughs and improvements every year. This book is devoted to review and present novel approaches for some of the main issues involved in this exciting and vigorous technology. The book is composed of 22 chapters, written by authors coming from 30 different institutions located in 12 different countries throughout the Americas, Asia and Europe. Thus, reflecting the wide international contribution to the book. The broad range of subjects presented in the book offers a general overview of the main issues in modern solid-state circuit technology. Furthermore, the book offers an in depth analysis on specific subjects for specialists. We believe the book is of great scientific and educational value for many readers. I am profoundly indebted to the support provided by all of those involved in the work. First and foremost I would like to acknowledge and thank the authors who worked hard and generously agreed to share their results and knowledge. Second I would like to express my gratitude to the Intech team that invited me to edit the book and give me their full support and a fruitful experience while working together to combine this book

    Millimeter-scale RF Integrated Circuits and Antennas for Energy-efficient Wireless Sensor Nodes

    Full text link
    Recently there has been increased demand for a millimeter-scale wireless sensor node for applications such as biomedical devices, defense, and surveillance. This form-factor is driven by a desire to be vanishingly small, injectable through a needle, or implantable through a minimally-invasive surgical procedure. Wireless communication is a necessity, but there are several challenges at the millimeter-scale wireless sensor node. One of the main challenges is external components like crystal reference and antenna become the bottleneck of realizing the mm-scale wireless sensor node device. A second challenge is power consumption of the electronics. At mm-scale, the micro-battery has limited capacity and small peak current. Moreover, the RF front-end circuits that operates at the highest frequency in the system will consume most of the power from the battery. Finally, as node volume reduces, there is a challenge of integrating the entire system together, in particular for the RF performance, because all components, including the battery and ICs, need to be placed in close proximity of the antenna. This research explores ways to implement low-power integrated circuits in an energy-constrained and volume constrained application. Three different prototypes are mainly conducted in the proposal. The first is a fully-encapsulated, autonomous, complete wireless sensor node with UWB transmitter in 10.6mm3 volume. It is the first time to demonstrate a full and stand-alone wireless sensing functionality with such a tiny integrated system. The second prototype is a low power GPS front-end receiver that supports burst-mode. A double super-heterodyne topology enables the reception of the three public GPS bands, L1, L2 and L5 simultaneously. The third prototype is an integrated rectangular slot loop antenna in a standard 0.13-μm BiCMOS technology. The antenna is efficiently designed to cover the bandwidth at 60 GHz band and easily satisfy the metal density rules and can be integrated with other circuitry in a standard process.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143972/1/hskims_1.pd

    Simulation of FinFET Structures

    Get PDF
    The intensive downscaling of MOS transistors has been the major driving force behind the aggressive increases in transistor density and performance, leading to more chip functionality at higher speeds. While on the other side the reduction in MOSFET dimensions leads to the close proximity between source and drain, which in turn reduces the ability of the gate electrode to control the potential distribution and current flow in the channel region and also results in some undesirable effects called the short-channel effects. These limitations associated with downscaling of MOSFET device geometries have lead device designers and researchers to number of innovative techniques which include the use of different device structures, different channel materials, different gate-oxide materials, different processes such as shallow trench isolation, source/drain silicidation, lightly doped extensions etc. to enable controlled device scaling to smaller dimensions. A lot of research and development works have been done in these and related fields and more remains to be carried out in order to exploit these devices for the wider applications
    corecore