18 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationDeep Neural Networks (DNNs) are the state-of-art solution in a growing number of tasks including computer vision, speech recognition, and genomics. However, DNNs are computationally expensive as they are carefully trained to extract and abstract features from raw data using multiple layers of neurons with millions of parameters. In this dissertation, we primarily focus on inference, e.g., using a DNN to classify an input image. This is an operation that will be repeatedly performed on billions of devices in the datacenter, in self-driving cars, in drones, etc. We observe that DNNs spend a vast majority of their runtime to runtime performing matrix-by-vector multiplications (MVM). MVMs have two major bottlenecks: fetching the matrix and performing sum-of-product operations. To address these bottlenecks, we use in-situ computing, where the matrix is stored in programmable resistor arrays, called crossbars, and sum-of-product operations are performed using analog computing. In this dissertation, we propose two hardware units, ISAAC and Newton.In ISAAC, we show that in-situ computing designs can outperform DNN digital accelerators, if they leverage pipelining, smart encodings, and can distribute a computation in time and space, within crossbars, and across crossbars. In the ISAAC design, roughly half the chip area/power can be attributed to the analog-to-digital conversion (ADC), i.e., it remains the key design challenge in mixed-signal accelerators for deep networks. In spite of the ADC bottleneck, ISAAC is able to out-perform the computational efficiency of the state-of-the-art design (DaDianNao) by 8x. In Newton, we take advantage of a number of techniques to address ADC inefficiency. These techniques exploit matrix transformations, heterogeneity, and smart mapping of computation to the analog substrate. We show that Newton can increase the efficiency of in-situ computing by an additional 2x. Finally, we show that in-situ computing, unfortunately, cannot be easily adapted to handle training of deep networks, i.e., it is only suitable for inference of already-trained networks. By improving the efficiency of DNN inference with ISAAC and Newton, we move closer to low-cost deep learning that in turn will have societal impact through self-driving cars, assistive systems for the disabled, and precision medicine

    Pruning random resistive memory for optimizing analogue AI

    Full text link
    The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic devices, such as resistive memory, which features in-memory computing, high scalability, and nonvolatility. However, analogue computing still faces the same challenges as before: programming nonidealities and expensive programming due to the underlying devices physics. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning to optimize the topology of a randomly weighted analogue resistive memory neural network. Software-wise, the topology of a randomly weighted neural network is optimized by pruning connections rather than precisely tuning resistive memory weights. Hardware-wise, we reveal the physical origin of the programming stochasticity using transmission electron microscopy, which is leveraged for large-scale and low-cost implementation of an overparameterized random neural network containing high-performance sub-networks. We implemented the co-design on a 40nm 256K resistive memory macro, observing 17.3% and 19.9% accuracy improvements in image and audio classification on FashionMNIST and Spoken digits datasets, as well as 9.8% (2%) improvement in PR (ROC) in image segmentation on DRIVE datasets, respectively. This is accompanied by 82.1%, 51.2%, and 99.8% improvement in energy efficiency thanks to analogue in-memory computing. By embracing the intrinsic stochasticity and in-memory computing, this work may solve the biggest obstacle of analogue computing systems and thus unleash their immense potential for next-generation AI hardware

    Efficient and low overhead memristive activation circuit for deep learning neural networks

    Get PDF
    An efficient memristor MIN function based activation circuit is presented for memristive neuromorphic systems, using only two memristors and a comparator. The ReLU activation function is approximated using this circuit. The ReLU activation function helps to significantly reduce the time and computational cost of training in neuromorphic systems due to its simplicity and effectiveness in deep neural networks. A multilayer neural network is simulated using this activation circuit in addition to traditional memristor crossbar arrays. The results illustrate that the proposed circuit is able to perform training effectively with significant savings in time and area in memristor crossbar based neural networks

    Programming Exploration of Memristor Crossbar

    Get PDF
    Memristor crossbar is prevailing as one of the most promising candidates to construct the neural network because of their similarity to biological synapses, favorable programmability, simple structure and high performance regarding area efficiency and power consumption. However, the performance of the memristor crossbar is limited by unideal programming and sensing process. In this thesis, the most preferred cell structure which is known as “one-transistor-one-memristor” is investigated. Different factors that may have impacts on programming, such as the structure, the parameters and the conductance of a crossbar cell are studied using both theoretical analysis and simulation. Based on previous analysis, the programming process of the memristor crossbar deserves a deep exploration. For programming, the primary objective is to find out the relationship between the programmability of the memristor crossbar and its characteristics, such as the IR-drop and the crossbar size. The results are expected to be useful references for researchers designing the memristor crossbar

    Memristive Computing

    Get PDF
    Memristive computing refers to the utilization of the memristor, the fourth fundamental passive circuit element, in computational tasks. The existence of the memristor was theoretically predicted in 1971 by Leon O. Chua, but experimentally validated only in 2008 by HP Labs. A memristor is essentially a nonvolatile nanoscale programmable resistor — indeed, memory resistor — whose resistance, or memristance to be precise, is changed by applying a voltage across, or current through, the device. Memristive computing is a new area of research, and many of its fundamental questions still remain open. For example, it is yet unclear which applications would benefit the most from the inherent nonlinear dynamics of memristors. In any case, these dynamics should be exploited to allow memristors to perform computation in a natural way instead of attempting to emulate existing technologies such as CMOS logic. Examples of such methods of computation presented in this thesis are memristive stateful logic operations, memristive multiplication based on the translinear principle, and the exploitation of nonlinear dynamics to construct chaotic memristive circuits. This thesis considers memristive computing at various levels of abstraction. The first part of the thesis analyses the physical properties and the current-voltage behaviour of a single device. The middle part presents memristor programming methods, and describes microcircuits for logic and analog operations. The final chapters discuss memristive computing in largescale applications. In particular, cellular neural networks, and associative memory architectures are proposed as applications that significantly benefit from memristive implementation. The work presents several new results on memristor modeling and programming, memristive logic, analog arithmetic operations on memristors, and applications of memristors. The main conclusion of this thesis is that memristive computing will be advantageous in large-scale, highly parallel mixed-mode processing architectures. This can be justified by the following two arguments. First, since processing can be performed directly within memristive memory architectures, the required circuitry, processing time, and possibly also power consumption can be reduced compared to a conventional CMOS implementation. Second, intrachip communication can be naturally implemented by a memristive crossbar structure.Siirretty Doriast

    Leveraging the Intrinsic Switching Behaviors of Spintronic Devices for Digital and Neuromorphic Circuits

    Get PDF
    With semiconductor technology scaling approaching atomic limits, novel approaches utilizing new memory and computation elements are sought in order to realize increased density, enhanced functionality, and new computational paradigms. Spintronic devices offer intriguing avenues to improve digital circuits by leveraging non-volatility to reduce static power dissipation and vertical integration for increased density. Novel hybrid spintronic-CMOS digital circuits are developed herein that illustrate enhanced functionality at reduced static power consumption and area cost. The developed spin-CMOS D Flip-Flop offers improved power-gating strategies by achieving instant store/restore capabilities while using 10 fewer transistors than typical CMOS-only implementations. The spin-CMOS Muller C-Element developed herein improves asynchronous pipelines by reducing the area overhead while adding enhanced functionality such as instant data store/restore and delay-element-free bundled data asynchronous pipelines. Spintronic devices also provide improved scaling for neuromorphic circuits by enabling compact and low power neuron and non-volatile synapse implementations while enabling new neuromorphic paradigms leveraging the stochastic behavior of spintronic devices to realize stochastic spiking neurons, which are more akin to biological neurons and commensurate with theories from computational neuroscience and probabilistic learning rules. Spintronic-based Probabilistic Activation Function circuits are utilized herein to provide a compact and low-power neuron for Binarized Neural Networks. Two implementations of stochastic spiking neurons with alternative speed, power, and area benefits are realized. Finally, a comprehensive neuromorphic architecture comprising stochastic spiking neurons, low-precision synapses with Probabilistic Hebbian Plasticity, and a novel non-volatile homeostasis mechanism is realized for subthreshold ultra-low-power unsupervised learning with robustness to process variations. Along with several case studies, implications for future spintronic digital and neuromorphic circuits are presented

    Searching for the physical nature of intelligence in Neuromorphic Nanowire Networks

    Get PDF
    The brain’s unique information processing efficiency has inspired the development of neuromorphic, or brain-inspired, hardware in effort to reduce the power consumption of conventional Artificial Intelligence (AI). One example of a neuromorphic system is nanowire networks (NWNs). NWNs have been shown to produce conductance pathways similar to neuro-synaptic pathways in the brain, demonstrating nonlinear dynamics, as well as emergent behaviours such as memory and learning. Their synapse-like electro-chemical junctions are connected by a heterogenous neural network-like structure. This makes NWNs a unique system for realising hardware-based machine intelligence that is potentially more brain-like than existing implementations of AI. Much of the brain’s emergent properties are thought to arise from a unique structure-function relationship. The first part of the thesis establishes structural network characterisation methods in NWNs. Borrowing techniques from neuroscience, a toolkit is introduced for characterising network topology in NWNs. NWNs are found to display a ‘small-world’ structure with highly modular connections, like simple biological systems. Next, investigation of the structure-function link in NWNs occurs via implementation of machine learning benchmark tasks on varying network structures. Highly modular networks exhibit an ability to multitask, while integrated networks suffer from crosstalk interference. Finally, above findings are combined to develop and implement neuroscience-inspired learning methods and tasks in NWNs. Specifically, an adaptation of a cognitive task that tests working memory in humans is implemented. Working memory and memory consolidation are demonstrated and found to be attributable to a process similar to synaptic metaplasticity in the brain. The results of this thesis have created new research directions that warrant further exploration to test the universality of the physical nature of intelligence in inorganic systems beyond NWNs
    corecore