120 research outputs found

    Bio-inspired learning and hardware acceleration with emerging memories

    Get PDF
    Machine Learning has permeated many aspects of engineering, ranging from the Internet of Things (IoT) applications to big data analytics. While computing resources available to implement these algorithms have become more powerful, both in terms of the complexity of problems that can be solved and the overall computing speed, the huge energy costs involved remains a significant challenge. The human brain, which has evolved over millions of years, is widely accepted as the most efficient control and cognitive processing platform. Neuro-biological studies have established that information processing in the human brain relies on impulse like signals emitted by neurons called action potentials. Motivated by these facts, the Spiking Neural Networks (SNNs), which are a bio-plausible version of neural networks have been proposed as an alternative computing paradigm where the timing of spikes generated by artificial neurons is central to its learning and inference capabilities. This dissertation demonstrates the computational power of the SNNs using conventional CMOS and emerging nanoscale hardware platforms. The first half of this dissertation presents an SNN architecture which is trained using a supervised spike-based learning algorithm for the handwritten digit classification problem. This network achieves an accuracy of 98.17% on the MNIST test data-set, with about 4X fewer parameters compared to the state-of-the-art neural networks achieving over 99% accuracy. In addition, a scheme for parallelizing and speeding up the SNN simulation on a GPU platform is presented. The second half of this dissertation presents an optimal hardware design for accelerating SNN inference and training with SRAM (Static Random Access Memory) and nanoscale non-volatile memory (NVM) crossbar arrays. Three prominent NVM devices are studied for realizing hardware accelerators for SNNs: Phase Change Memory (PCM), Spin Transfer Torque RAM (STT-RAM) and Resistive RAM (RRAM). The analysis shows that a spike-based inference engine with crossbar arrays of STT-RAM bit-cells is 2X and 5X more efficient compared to PCM and RRAM memories, respectively. Furthermore, the STT-RAM design has nearly 6X higher throughput per unit Watt per unit area than that of an equivalent SRAM-based (Static Random Access Memory) design. A hardware accelerator with on-chip learning on an STT-RAM memory array is also designed, requiring 1616 bits of floating-point synaptic weight precision to reach the baseline SNN algorithmic performance on the MNIST dataset. The complete design with STT-RAM crossbar array achieves nearly 20X higher throughput per unit Watt per unit mm^2 than an equivalent design with SRAM memory. In summary, this work demonstrates the potential of spike-based neuromorphic computing algorithms and its efficient realization in hardware based on conventional CMOS as well as emerging technologies. The schemes presented here can be further extended to design spike-based systems that can be ubiquitously deployed for energy and memory constrained edge computing applications

    RRAM Crossbar Arrays for Storage Class Memory Applications : Throughput and Density Considerations

    Get PDF
    As more and more high density memories are required to satisfy the Internet of Things ecosystem, academics and industrials are looking for an intermediate solution to fill the gap between DRAM and Flash NAND in the memory hierarchy. The emergence of Resistive Switching Technologies (RRAM) proposes a potential solution to this demand for fast, low cost, high density and non-volatile memory. However, nowadays transistor-less RRAM-based architectures, such as Crosspoint, suffers of several issues such as sneakpath, IRdrop and periphery overhead. In this work, we propose to explore the positioning of RRAM crosspoint memories regarding DRAM and NAND in terms of density and write throughput. We present several design guidelines then show that for the optimal RRAM crosspoint architecture (2-layers with common bitline), massively multiple bank write is the solution to optimize density and write throughput to around 20-100Gbit/cm2 and 200-500MB/s respectively for 32 to 64 parallel access

    Reliability Analysis of Memristor Crossbar Routers: Collisions and On/off Ratio Requirement

    Full text link
    Memristors are commonly used in crossbar arrays as “in-memory computing” elements to solve the von-Neumann bottleneck problem. However, they can also be used as “in-memory routing” elements to configure on-chip interconnection schemes and route signals among computing elements in configurable multi-core neuromorphic processors. While there has been a significant focus on the use of memristive devices as in-memory computing elements, to date, studies on the fundamental reliability properties of memristors as routing elements are still missing. In this paper, we analyze the reliability issues of using these devices in routing crossbar arrays, caused by sharing routing resources (collisions), and undesired pulses due to the leakage paths (on/off ratio requirement). We show that there is a trade-off between routing collision probability and the degree of connectivity (i.e., fan-in) of the receivers sharing routing channels. We provide specifications and guidelines based on a theoretical analysis for engineering the properties of memristive devices, and for designing routing systems based on memristor crossbars

    On-Chip Learning and Inference Acceleration of Sparse Representations

    Get PDF
    abstract: The past decade has seen a tremendous surge in running machine learning (ML) functions on mobile devices, from mere novelty applications to now indispensable features for the next generation of devices. While the mobile platform capabilities range widely, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful CPUs with GPUs to accelerate the computation of deep neural networks (DNNs), which are the most common structures to perform ML operations. But traditional von Neumann architectures are not optimized for the high memory bandwidth and massively parallel computation demands required by DNNs. Hence, propelling research into non-von Neumann architectures to support the demands of DNNs. The re-imagining of computer architectures to perform efficient DNN computations requires focusing on the prohibitive demands presented by DNNs and alleviating them. The two central challenges for efficient computation are (1) large memory storage and movement due to weights of the DNN and (2) massively parallel multiplications to compute the DNN output. Introducing sparsity into the DNNs, where certain percentage of either the weights or the outputs of the DNN are zero, greatly helps with both challenges. This along with algorithm-hardware co-design to compress the DNNs is demonstrated to provide efficient solutions to greatly reduce the power consumption of hardware that compute DNNs. Additionally, exploring emerging technologies such as non-volatile memories and 3-D stacking of silicon in conjunction with algorithm-hardware co-design architectures will pave the way for the next generation of mobile devices. Towards the objectives stated above, our specific contributions include (a) an architecture based on resistive crosspoint array that can update all values stored and compute matrix vector multiplication in parallel within a single cycle, (b) a framework of training DNNs with a block-wise sparsity to drastically reduce memory storage and total number of computations required to compute the output of DNNs, (c) the exploration of hardware implementations of sparse DNNs and architectural guidelines to reduce power consumption for the implementations in monolithic 3D integrated circuits, and (d) a prototype chip in 65nm CMOS accelerator for long-short term memory networks trained with the proposed block-wise sparsity scheme.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Improving Performance and Endurance for Crossbar Resistive Memory

    Get PDF
    Resistive Memory (ReRAM) has emerged as a promising non-volatile memory technology that may replace a significant portion of DRAM in future computer systems. When adopting crossbar architecture, ReRAM cell can achieve the smallest theoretical size in fabrication, ideally for constructing dense memory with large capacity. However, crossbar cell structure suffers from severe performance and endurance degradations, which come from large voltage drops on long wires. In this dissertation, I first study the correlation between the ReRAM cell switching latency and the number of cells in low resistant state (LRS) along bitlines, and propose to dynamically speed up write operations based on bitline data patterns. By leveraging the intrinsic in-memory processing capability of ReRAM crossbars, a low overhead runtime profiler that effectively tracks the data patterns in different bitlines is proposed. To achieve further write latency reduction, data compression and row address dependent memory data layout are employed to reduce the numbers of LRS cells on bitlines. Moreover, two optimization techniques are presented to mitigate energy overhead brought by bitline data patterns tracking. Second, I propose XWL, a novel table-based wear leveling scheme for ReRAM crossbars and study the correlation between write endurance and voltage stress in ReRAM crossbars. By estimating and tracking the effective write stress to different rows at runtime, XWL chooses the ones that are stressed the most to mitigate. Additionally, two extended scenarios are further examined for the performance and endurance issues in neural network accelerators as well as 3D vertical ReRAM (3D-VRAM) arrays. For the ReRAM crossbar-based accelerators, by exploiting the wearing out mechanism of ReRAM cell, a novel comprehensive framework, ReNEW, is proposed to enhance the lifetime of the ReRAM crossbar-based accelerators, particularly for neural network training. To reduce the write latency in 3D-VRAM arrays, a collection of techniques, including an in-memory data encoding scheme, a data pattern estimator for assessing cell resistance distributions, and a write time reduction scheme that opportunistically reduces RESET latency with runtime data patterns, are devised

    High-Density Solid-State Memory Devices and Technologies

    Get PDF
    This Special Issue aims to examine high-density solid-state memory devices and technologies from various standpoints in an attempt to foster their continuous success in the future. Considering that broadening of the range of applications will likely offer different types of solid-state memories their chance in the spotlight, the Special Issue is not focused on a specific storage solution but rather embraces all the most relevant solid-state memory devices and technologies currently on stage. Even the subjects dealt with in this Special Issue are widespread, ranging from process and design issues/innovations to the experimental and theoretical analysis of the operation and from the performance and reliability of memory devices and arrays to the exploitation of solid-state memories to pursue new computing paradigms

    Vertical Heterostructure III-V MOSFETs for CMOS, RF and Memory Applications

    Get PDF
    This thesis focuses mainly on the co-integration of vertical nanowiren-type InAs and p-type GaSb MOSFETs on Si (Paper I & II), whereMOVPE grown vertical InAs-GaSb heterostructure nanowires areused for realizing monolithically integrated and co-processed all-III-V CMOS.Utilizing a bottom-up approach based on MOVPE grown nanowires enablesdesign flexibilities, such as in-situ doping and heterostructure formation,which serves to reduce the amount of mask steps during fabrication. By refiningthe fabrication techniques, using a self-aligned gate-last process, scaled10-20 nm diameters are achieved for balanced drive currents at Ion ∼ 100μA/μm, considering Ioff at 100 nA/μm (VDD = 0.5 V). This is enabledby greatly improved p-type MOSFET performance reaching a maximumtransconductance of 260 μA/μm at VDS = 0.5 V. Lowered power dissipationfor CMOS circuits requires good threshold voltage VT matching of the n- andp-type device, which is also demonstrated for basic inverter circuits. Thevarious effects contributing to VT-shifts are also studied in detail focusing onthe InAs channel devices (with highest transconductance of 2.6 mA/μm), byusing Electron Holography and a novel gate position variation method (PaperV).The advancements in all-III-V CMOS integration spawned individual studiesinto the strengths of the n- and p-type III-V devices, respectively. Traditionallymaterials such as InAs and InGaAs provide excellent electrontransport properties, therefore they are frequently used in devices for highfrequency RF applications. In contrast, the III-V p-type alternatives have beenlacking performance mostly due to the difficult oxidation properties of Sb-based materials. Therefore, a study of the GaSb properties, in a MOSFETchannel, was designed and enabled by new manufacturing techniques, whichallowed gate-length scaling from 40 to 140 nm for p-type Sb-based MOSFETs(Paper III). The new fabrication method allowed for integration of deviceswith symmetrical contacts as compared to previous work which relied on atunnel-contact at the source-side. By modelling based on measured data fieldeffecthole mobility of 70 cm2/Vs was calculated, well in line with previouslyreported studies on GaSb nanowires. The oxidation properties of the GaSbgate-stack was further characterized by XPS, where high intensities of xraysare achieved using a synchrotron source allowed for characterization ofnanowires (Paper VI). Here, in-situ H2-plasma treatment, in parallel with XPSmeasurements, enabled a study of the time-dependence during full removalof GaSb native oxides.The last focus of the thesis was building on the existing strengths of verticalheterostructure III-V n-type (InAs-InGaAs graded channel) devices. Typically,these devices demonstrate high-current densities (gm >3 mS/μm) and excellentmodulation properties (off-state current down to 1 nA/μm). However,minimizing the parasitic capacitances, due to various overlaps originatingfrom a low access-resistance design, has proven difficult. Therefore, newmethods for spacers in both the vertical and planar directions was developedand studied in detail. The new fabrication methods including sidewall spacersachieved gate-drain capacitance CGD levels close to 0.2 fF/μm, which isthe established limit by optimized high-speed devices. The vertical spacertechnology, using SiO2 on the nanowire sidewalls, is further improved inthis thesis which enables new co-integration schemes for memory arrays.Namely, the refined sidewall spacer method is used to realize selective recessetching of the channel and reduced capacitance for large array memoryselector devices (InAs channel) vertically integrated with Resistive RandomAccess Memory (RRAM) memristors. (Paper IV) The fabricated 1-transistor-1-memristor (1T1R) demonstrator cell shows excellent endurance and retentionfor the RRAM by maintaining constant ratio of the high and low resistive state(HRS/LRS) after 106 switching cycles
    corecore