18 research outputs found
Doctor of Philosophy
dissertationDeep Neural Networks (DNNs) are the state-of-art solution in a growing number of tasks including computer vision, speech recognition, and genomics. However, DNNs are computationally expensive as they are carefully trained to extract and abstract features from raw data using multiple layers of neurons with millions of parameters. In this dissertation, we primarily focus on inference, e.g., using a DNN to classify an input image. This is an operation that will be repeatedly performed on billions of devices in the datacenter, in self-driving cars, in drones, etc. We observe that DNNs spend a vast majority of their runtime to runtime performing matrix-by-vector multiplications (MVM). MVMs have two major bottlenecks: fetching the matrix and performing sum-of-product operations. To address these bottlenecks, we use in-situ computing, where the matrix is stored in programmable resistor arrays, called crossbars, and sum-of-product operations are performed using analog computing. In this dissertation, we propose two hardware units, ISAAC and Newton.In ISAAC, we show that in-situ computing designs can outperform DNN digital accelerators, if they leverage pipelining, smart encodings, and can distribute a computation in time and space, within crossbars, and across crossbars. In the ISAAC design, roughly half the chip area/power can be attributed to the analog-to-digital conversion (ADC), i.e., it remains the key design challenge in mixed-signal accelerators for deep networks. In spite of the ADC bottleneck, ISAAC is able to out-perform the computational efficiency of the state-of-the-art design (DaDianNao) by 8x. In Newton, we take advantage of a number of techniques to address ADC inefficiency. These techniques exploit matrix transformations, heterogeneity, and smart mapping of computation to the analog substrate. We show that Newton can increase the efficiency of in-situ computing by an additional 2x. Finally, we show that in-situ computing, unfortunately, cannot be easily adapted to handle training of deep networks, i.e., it is only suitable for inference of already-trained networks. By improving the efficiency of DNN inference with ISAAC and Newton, we move closer to low-cost deep learning that in turn will have societal impact through self-driving cars, assistive systems for the disabled, and precision medicine
Recommended from our members
EMSN: An Energy-Efficient Memristive Sequencer Network for Human Emotion Classification in Mental Health Monitoring
Fundamental Research Funds for the Provincial University of Zhejiang (Grant Number: GK229909299001-06);
10.13039/501100001809-National Natural Science Foundation of China (Grant Number: 62001149);
Natural Science Foundation of Zhejiang Province (Grant Number: LQ21F010009)
Pruning random resistive memory for optimizing analogue AI
The rapid advancement of artificial intelligence (AI) has been marked by the
large language models exhibiting human-like intelligence. However, these models
also present unprecedented challenges to energy consumption and environmental
sustainability. One promising solution is to revisit analogue computing, a
technique that predates digital computing and exploits emerging analogue
electronic devices, such as resistive memory, which features in-memory
computing, high scalability, and nonvolatility. However, analogue computing
still faces the same challenges as before: programming nonidealities and
expensive programming due to the underlying devices physics. Here, we report a
universal solution, software-hardware co-design using structural
plasticity-inspired edge pruning to optimize the topology of a randomly
weighted analogue resistive memory neural network. Software-wise, the topology
of a randomly weighted neural network is optimized by pruning connections
rather than precisely tuning resistive memory weights. Hardware-wise, we reveal
the physical origin of the programming stochasticity using transmission
electron microscopy, which is leveraged for large-scale and low-cost
implementation of an overparameterized random neural network containing
high-performance sub-networks. We implemented the co-design on a 40nm 256K
resistive memory macro, observing 17.3% and 19.9% accuracy improvements in
image and audio classification on FashionMNIST and Spoken digits datasets, as
well as 9.8% (2%) improvement in PR (ROC) in image segmentation on DRIVE
datasets, respectively. This is accompanied by 82.1%, 51.2%, and 99.8%
improvement in energy efficiency thanks to analogue in-memory computing. By
embracing the intrinsic stochasticity and in-memory computing, this work may
solve the biggest obstacle of analogue computing systems and thus unleash their
immense potential for next-generation AI hardware
Efficient and low overhead memristive activation circuit for deep learning neural networks
An efficient memristor MIN function based activation circuit is presented for memristive neuromorphic systems, using only two memristors and a comparator. The ReLU activation function is approximated using this circuit. The ReLU activation function helps to significantly reduce the time and computational cost of training in neuromorphic systems due to its simplicity and effectiveness in deep neural networks. A multilayer neural network is simulated using this activation circuit in addition to traditional memristor crossbar arrays. The results illustrate that the proposed circuit is able to perform training effectively with significant savings in time and area in memristor crossbar based neural networks
Programming Exploration of Memristor Crossbar
Memristor crossbar is prevailing as one of the most promising candidates to construct the neural network because of their similarity to biological synapses, favorable programmability, simple structure and high performance regarding area efficiency and power consumption. However, the performance of the memristor crossbar is limited by unideal programming and sensing process.
In this thesis, the most preferred cell structure which is known as âone-transistor-one-memristorâ is investigated. Different factors that may have impacts on programming, such as the structure, the parameters and the conductance of a crossbar cell are studied using both theoretical analysis and simulation.
Based on previous analysis, the programming process of the memristor crossbar deserves a deep exploration. For programming, the primary objective is to find out the relationship between the programmability of the memristor crossbar and its characteristics, such as the IR-drop and the crossbar size. The results are expected to be useful references for researchers designing the memristor crossbar
Memristive Computing
Memristive computing refers to the utilization of the memristor, the fourth
fundamental passive circuit element, in computational tasks.
The existence of the memristor was theoretically predicted in 1971 by
Leon O. Chua, but experimentally validated only in 2008 by HP Labs. A
memristor is essentially a nonvolatile nanoscale programmable resistor â
indeed, memory resistor â whose resistance, or memristance to be precise,
is changed by applying a voltage across, or current through, the device.
Memristive computing is a new area of research, and many of its fundamental
questions still remain open. For example, it is yet unclear which
applications would benefit the most from the inherent nonlinear dynamics
of memristors. In any case, these dynamics should be exploited to allow
memristors to perform computation in a natural way instead of attempting
to emulate existing technologies such as CMOS logic. Examples of such
methods of computation presented in this thesis are memristive stateful logic
operations, memristive multiplication based on the translinear principle, and
the exploitation of nonlinear dynamics to construct chaotic memristive circuits.
This thesis considers memristive computing at various levels of abstraction.
The first part of the thesis analyses the physical properties and the
current-voltage behaviour of a single device. The middle part presents memristor
programming methods, and describes microcircuits for logic and analog
operations. The final chapters discuss memristive computing in largescale
applications. In particular, cellular neural networks, and associative
memory architectures are proposed as applications that significantly benefit
from memristive implementation. The work presents several new results on
memristor modeling and programming, memristive logic, analog arithmetic
operations on memristors, and applications of memristors.
The main conclusion of this thesis is that memristive computing will
be advantageous in large-scale, highly parallel mixed-mode processing architectures.
This can be justified by the following two arguments. First,
since processing can be performed directly within memristive memory architectures,
the required circuitry, processing time, and possibly also power
consumption can be reduced compared to a conventional CMOS implementation.
Second, intrachip communication can be naturally implemented by
a memristive crossbar structure.Siirretty Doriast
Leveraging the Intrinsic Switching Behaviors of Spintronic Devices for Digital and Neuromorphic Circuits
With semiconductor technology scaling approaching atomic limits, novel approaches utilizing new memory and computation elements are sought in order to realize increased density, enhanced functionality, and new computational paradigms. Spintronic devices offer intriguing avenues to improve digital circuits by leveraging non-volatility to reduce static power dissipation and vertical integration for increased density. Novel hybrid spintronic-CMOS digital circuits are developed herein that illustrate enhanced functionality at reduced static power consumption and area cost. The developed spin-CMOS D Flip-Flop offers improved power-gating strategies by achieving instant store/restore capabilities while using 10 fewer transistors than typical CMOS-only implementations. The spin-CMOS Muller C-Element developed herein improves asynchronous pipelines by reducing the area overhead while adding enhanced functionality such as instant data store/restore and delay-element-free bundled data asynchronous pipelines. Spintronic devices also provide improved scaling for neuromorphic circuits by enabling compact and low power neuron and non-volatile synapse implementations while enabling new neuromorphic paradigms leveraging the stochastic behavior of spintronic devices to realize stochastic spiking neurons, which are more akin to biological neurons and commensurate with theories from computational neuroscience and probabilistic learning rules. Spintronic-based Probabilistic Activation Function circuits are utilized herein to provide a compact and low-power neuron for Binarized Neural Networks. Two implementations of stochastic spiking neurons with alternative speed, power, and area benefits are realized. Finally, a comprehensive neuromorphic architecture comprising stochastic spiking neurons, low-precision synapses with Probabilistic Hebbian Plasticity, and a novel non-volatile homeostasis mechanism is realized for subthreshold ultra-low-power unsupervised learning with robustness to process variations. Along with several case studies, implications for future spintronic digital and neuromorphic circuits are presented
Searching for the physical nature of intelligence in Neuromorphic Nanowire Networks
The brainâs unique information processing efficiency has inspired the development of neuromorphic, or brain-inspired, hardware in effort to reduce the power consumption of conventional Artificial Intelligence (AI). One example of a neuromorphic system is nanowire networks (NWNs). NWNs have been shown to produce conductance pathways similar to neuro-synaptic pathways in the brain, demonstrating nonlinear dynamics, as well as emergent behaviours such as memory and learning. Their synapse-like electro-chemical junctions are connected by a heterogenous neural network-like structure. This makes NWNs a unique system for realising hardware-based machine intelligence that is potentially more brain-like than existing implementations of AI.
Much of the brainâs emergent properties are thought to arise from a unique structure-function relationship. The first part of the thesis establishes structural network characterisation methods in NWNs. Borrowing techniques from neuroscience, a toolkit is introduced for characterising network topology in NWNs. NWNs are found to display a âsmall-worldâ structure with highly modular connections, like simple biological systems.
Next, investigation of the structure-function link in NWNs occurs via implementation of machine learning benchmark tasks on varying network structures. Highly modular networks exhibit an ability to multitask, while integrated networks suffer from crosstalk interference.
Finally, above findings are combined to develop and implement neuroscience-inspired learning methods and tasks in NWNs. Specifically, an adaptation of a cognitive task that tests working memory in humans is implemented. Working memory and memory consolidation are demonstrated and found to be attributable to a process similar to synaptic metaplasticity in the brain.
The results of this thesis have created new research directions that warrant further exploration to test the universality of the physical nature of intelligence in inorganic systems beyond NWNs