363 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationDeep Neural Networks (DNNs) are the state-of-art solution in a growing number of tasks including computer vision, speech recognition, and genomics. However, DNNs are computationally expensive as they are carefully trained to extract and abstract features from raw data using multiple layers of neurons with millions of parameters. In this dissertation, we primarily focus on inference, e.g., using a DNN to classify an input image. This is an operation that will be repeatedly performed on billions of devices in the datacenter, in self-driving cars, in drones, etc. We observe that DNNs spend a vast majority of their runtime to runtime performing matrix-by-vector multiplications (MVM). MVMs have two major bottlenecks: fetching the matrix and performing sum-of-product operations. To address these bottlenecks, we use in-situ computing, where the matrix is stored in programmable resistor arrays, called crossbars, and sum-of-product operations are performed using analog computing. In this dissertation, we propose two hardware units, ISAAC and Newton.In ISAAC, we show that in-situ computing designs can outperform DNN digital accelerators, if they leverage pipelining, smart encodings, and can distribute a computation in time and space, within crossbars, and across crossbars. In the ISAAC design, roughly half the chip area/power can be attributed to the analog-to-digital conversion (ADC), i.e., it remains the key design challenge in mixed-signal accelerators for deep networks. In spite of the ADC bottleneck, ISAAC is able to out-perform the computational efficiency of the state-of-the-art design (DaDianNao) by 8x. In Newton, we take advantage of a number of techniques to address ADC inefficiency. These techniques exploit matrix transformations, heterogeneity, and smart mapping of computation to the analog substrate. We show that Newton can increase the efficiency of in-situ computing by an additional 2x. Finally, we show that in-situ computing, unfortunately, cannot be easily adapted to handle training of deep networks, i.e., it is only suitable for inference of already-trained networks. By improving the efficiency of DNN inference with ISAAC and Newton, we move closer to low-cost deep learning that in turn will have societal impact through self-driving cars, assistive systems for the disabled, and precision medicine

    Analysis of large scale linear programming problems with embedded network structures: Detection and solution algorithms

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Linear programming (LP) models that contain a (substantial) network structure frequently arise in many real life applications. In this thesis, we investigate two main questions; i) how an embedded network structure can be detected, ii) how the network structure can be exploited to create improved sparse simplex solution algorithms. In order to extract an embedded pure network structure from a general LP problem we develop two new heuristics. The first heuristic is an alternative multi-stage generalised upper bounds (GUB) based approach which finds as many GUB subsets as possible. In order to identify a GUB subset two different approaches are introduced; the first is based on the notion of Markowitz merit count and the second exploits an independent set in the corresponding graph. The second heuristic is based on the generalised signed graph of the coefficient matrix. This heuristic determines whether the given LP problem is an entirely pure network; this is in contrast to all previously known heuristics. Using generalised signed graphs, we prove that the problem of detecting the maximum size embedded network structure within an LP problem is NP-hard. The two detection algorithms perform very well computationally and make positive contributions to the known body of results for the embedded network detection. For computational solution a decomposition based approach is presented which solves a network problem with side constraints. In this approach, the original coefficient matrix is partitioned into the network and the non-network parts. For the partitioned problem, we investigate two alternative decomposition techniques namely, Lagrangean relaxation and Benders decomposition. Active variables identified by these procedures are then used to create an advanced basis for the original problem. The computational results of applying these techniques to a selection of Netlib models are encouraging. The development and computational investigation of this solution algorithm constitute further contribution made by the research reported in this thesis.This study is funded by the Turkish Educational Council and Mugla University

    Secure Hardware Implementation of Post Quantum Cryptosystems

    Get PDF
    Solving a hard mathematical problem is the security basis of all current cryptographic systems. With the realization of a large scale quantum computer, hard mathematical problems such as integer factorization and discrete logarithmic problems will be easily solved with special algorithms implemented on such a computer. Indeed, only post-quantum cryptosystems which defy quantum attacks will survive in the post-quantum era. Each newly proposed post-quantum cryptosystem has to be scrutinized against all different types of attacks. Attacks can be classified into mathematical cryptanalysis and side channel attacks. In this thesis, we propose secure hardware implementations against side channel attacks for two of the most promising post-quantum algorithms: the lattice-based public key cryptosystem, NTRU, and the multivariate public key cryptosystem, Rainbow, against power analysis attacks and fault analysis attacks, respectively. NTRUEncrypt is a family of public key cryptosystems that uses lattice-based cryptography. It has been accepted as an IEEE P1363 standard and as an X9.98 Standard. In addition to its small footprint compared to other number theory based public key systems, its resistance to quantum attacks makes it a very attractive candidate for post quantum cryptosystems. On the other hand, similar to other cryptographic schemes, unprotected hardware implementations of NTRUEncrypt are susceptible to side channel attacks such as timing and power analysis. In this thesis, we present an FPGA implementation of NTRUEncrypt which is resistant to first order differential power analysis (DPA) attacks. Our countermeasures are implemented at the architecture level. In particular, we split the ciphertext into two randomly generated shares. This guarantees that during the first step of the decryption process, the inputs to the convolution modules, which are convoluted with the secret key polynomial, are uniformly chosen random polynomials which are freshly generated for each convolution operation and are not under the control of the attacker. The two shares are then processed in parallel without explicitly combining them until the final stage of the decryption. Furthermore, during the final stage of the decryption, we also split the used secret key polynomial into two randomly generated shares which provides theoretical resistance against the considered class of power analysis attacks. The proposed architecture is implemented using Altera Cyclone IV FPGA and simulated on Quartus II in order to compare the non-masked architecture with the masked one. For the considered set of parameters, the area overhead of the protected implementation is about 60% while the latency overhead is between 1.4% to 6.9%. Multivariate Public Key Cryptosystems (MPKCs) are cryptographic schemes based on the difficulty of solving a set of multivariate system of nonlinear equations over a finite field. MPKCs are considered to be secure against quantum attacks. Rainbow, an MPKC signature scheme, is among the leading MPKC candidates for post quantum cryptography. In this thesis, we propose and compare two fault analysis-resistant implementations for the Rainbow signature scheme. The hardware platform for our implementations is Xilinx FPGA Virtex 7 family. Our implementation for the Rainbow signature completes in 191 cycles using a 20ns clock period which is an improvement over the previously reported implementations. The verification completes in 141 cycles using the same clock period. The two proposed fault analysis-resistant schemes offer different levels of protections and increase the area overhead by a factor of 33% and 9%, respectively. The first protection scheme acquires a time overhead of about 72%, but the second one does not have any time overhead

    Lattice-based digital signature and discrete gaussian sampling

    Get PDF
    Lattice-based cryptography has generated considerable interest in the last two decades due toattractive features, including conjectured security against quantum attacks, strong securityguarantees from worst-case hardness assumptions and constructions of fully homomorphicencryption schemes. On the other hand, even though it is a crucial part of many lattice-basedschemes, Gaussian sampling is still lagging and continues to limit the effectiveness of this newcryptography. The first goal of this thesis is to improve the efficiency of Gaussian sampling forlattice-based hash-and-sign signature schemes. We propose a non-centered algorithm, with aflexible time-memory tradeoff, as fast as its centered variant for practicable size of precomputedtables. We also use the RĂ©nyi divergence to bound the precision requirement to the standarddouble precision. Our second objective is to construct Falcon, a new hash-and-sign signaturescheme, based on the theoretical framework of Gentry, Peikert and Vaikuntanathan for latticebasedsignatures. We instantiate that framework over NTRU lattices with a new trapdoor sampler

    Design and implementation of high-radix arithmetic systems based on the SDNR/RNS data representation

    Get PDF
    This project involved the design and implementation of high-radix arithmetic systems based on the hybrid SDNRIRNS data representation. Some real-time applications require a real-time arithmetic system. An SDNR/RNS arithmetic system provides parallel, real-time processing. The advantages and disadvantages of high-radix SDNR/RNS arithmetic, and the feasibility of implementing SDNR/RNS arithmetic systems in CMOS VLSI technology, were investigated in this project. A common methodological model, which included the stages of analysis, design, implementation, testing, and simulation, was followed. The combination of the SDNR and RNS transforms potential complex logic networks into simpler logic blocks. It was found that when constructing a SDNRIRNS adder, factors such as the radix, digit set, and moduli must be taken into account. There are many avenues still to explore. For example, implementing other arithmetic systems in the same CMOS VLSI technology used in this project and comparing them to equivalent SDNR/RNS systems would provide a set of benchmarks. These benchmarks would be useful in addressing issues relating to relative performance

    Theoretical and practical efficiency aspects in cryptography

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Inference And Learning In Spiking Neural Networks For Neuromorphic Systems

    Get PDF
    Neuromorphic computing is a computing field that takes inspiration from the biological and physical characteristics of the neocortex system to motivate a new paradigm of highly parallel and distributed computing to take on the demands of the ever-increasing scale and computational complexity of machine intelligence esp. in energy-limited systems such as Edge devices, Internet-of-Things (IOT), and cyber physical systems (CPS). Spiking neural network (SNN) is often studied together with neuromorphic computing as the underlying computational model . Similar to the biological neural system, SNN is an inherently dynamic and stateful network. The state and output of SNN do not only dependent on the current input, but also dependent on the history information. Another distinct property of SNN is that the information is represented, transmitted, and processed as discrete spike events, also referred to as action potentials. All the processing happens in the neurons such that the computation itself is massively distributed and parallel. This enables low power information transmission and processing. However, it is inefficient to implement SNNs on traditional Von Neumann architecture due to the performance gap between memory and processor. This has led to the advent of energy-efficient large-scale neuromorphic hardware such as IBM\u27s TrueNorth and Intel\u27s Loihi that enables low power implementation of large-scale neural networks for real-time applications. And although spiking networks have theoretically been shown to have Turing-equivalent computing power, it remains a challenge to train deep SNNs; the threshold functions that generate spikes are discontinuous, so they do not have derivatives and cannot directly utilize gradient-based optimization algorithms for training. Biologically plausible learning mechanism spike-timing-dependent plasticity (STDP) and its variants are local in synapses and time but are unstable during training and difficult to train multi-layer SNNs. To better exploit the energy-saving features such as spike domain representation and stochastic computing provided by SNNs in neuromorphic hardware, and to address the hardware limitations such as limited data precision and neuron fan-in/fan-out constraints, it is necessary to re-design a neural network including its structure and computing. Our work focuses on low-level (activations, weights) and high-level (alternative learning algorithms) redesign techniques to enable inference and learning with SNNs in neuromorphic hardware. First, we focused on transforming a trained artificial neural network (ANN) to a form that is suitable for neuromorphic hardware implementation. Here, we tackle transforming Long Short-Term Memory (LSTM), a version of recurrent neural network (RNN) which includes recurrent connectivity to enable learning long temporal patterns. This is specifically a difficult challenge due to the inherent nature of RNNs and SNNs; the recurrent connectivity in RNNs induces temporal dynamics which require synchronicity, especially with the added complexity of LSTMs; and SNNs are asynchronous in nature. In addition, the constraints of the neuromorphic hardware provided a massive challenge for this realization. Thus, in this work, we invented a store-and-release circuit using integrate-and-fire neurons which allows the synchronization and then developed modules using that circuit to replicate various parts of the LSTM. These modules enabled implementation of LSTMs with spiking neurons on IBM’s TrueNorth Neurosynaptic processor. This is the first work to realize such LSTM networks utilizing spiking neurons and implement on a neuromorphic hardware. This opens avenues for the use of neuromorphic hardware in applications involving temporal patterns. Moving from mapping a pretrained ANN, we work on training networks on the neuromorphic hardware. Here, we first looked at the biologically plausible learning algorithm called STDP which is a Hebbian learning rule for learning without supervision. Simplified computational interpretations of STDP is either unstable and/or complex such that it is costly to implement on hardware. Thus, in this work, we proposed a stable version of STDP and applied intentional approximations for low-cost hardware implementation called Quantized 2-Power Shift (Q2PS) rule. With this version, we performed both unsupervised learning for feature extraction and supervised learning for classification in a multilayer SNN to achieve comparable to better accuracy on MNIST dataset compared to manually labelled two-layered networks. Next, we approached training multilayer SNNs on a neuromorphic hardware with backpropagation, a gradient-based optimization algorithm that forms the backbone of deep neural networks (DNN). Although STDP is biologically plausible, its not as robust for learning deep networks as backpropagation is for DNNs. However, backpropagation is not biologically plausible and not suitable to be directly applied to SNNs, neither can it be implemented on a neuromorphic hardware. Thus, in the first part of this work, we devise a set of approximations to transform backprogation to the spike domain such that it is suitable for SNNs. After the set of approximations, we adapted the connectivity and weight update rule in backpropagation to enable learning solely based on the locally available information such that it resembled a rate-based STDP algorithm. We called this Error-Modulated STDP (EMSTDP). In the next part of this work, we implemented EMSTDP on Intel\u27s Loihi neuromorphic chip to realize online in-hardware supervised learning of deep SNNs. This is the first realization of a fully spike-based approximation of backpropagation algorithm implemented on a neuromorphic processor. This is the first step towards building an autonomous machine that learns continuously from its environment and experiences

    Inference and Learning in Spiking Neural Networks for Neuromorphic Systems

    Get PDF
    Neuromorphic computing is a computing field that takes inspiration from the biological and physical characteristics of the neocortex system to motivate a new paradigm of highly parallel and distributed computing to take on the demands of the ever-increasing scale and computational complexity of machine intelligence esp. in energy-limited systems such as Edge devices, Internet-of-Things (IOT), and cyber physical systems (CPS). Spiking neural network (SNN) is often studied together with neuromorphic computing as the underlying computational model . Similar to the biological neural system, SNN is an inherently dynamic and stateful network. The state and output of SNN do not only dependent on the current input, but also dependent on the history information. Another distinct property of SNN is that the information is represented, transmitted, and processed as discrete spike events, also referred to as action potentials. All the processing happens in the neurons such that the computation itself is massively distributed and parallel. This enables low power information transmission and processing. However, it is inefficient to implement SNNs on traditional Von Neumann architecture due to the performance gap between memory and processor. This has led to the advent of energy-efficient large-scale neuromorphic hardware such as IBM\u27s TrueNorth and Intel\u27s Loihi that enables low power implementation of large-scale neural networks for real-time applications. And although spiking networks have theoretically been shown to have Turing-equivalent computing power, it remains a challenge to train deep SNNs; the threshold functions that generate spikes are discontinuous, so they do not have derivatives and cannot directly utilize gradient-based optimization algorithms for training. Biologically plausible learning mechanism spike-timing-dependent plasticity (STDP) and its variants are local in synapses and time but are unstable during training and difficult to train multi-layer SNNs. To better exploit the energy-saving features such as spike domain representation and stochastic computing provided by SNNs in neuromorphic hardware, and to address the hardware limitations such as limited data precision and neuron fan-in/fan-out constraints, it is necessary to re-design a neural network including its structure and computing. Our work focuses on low-level (activations, weights) and high-level (alternative learning algorithms) redesign techniques to enable inference and learning with SNNs in neuromorphic hardware. First, we focused on transforming a trained artificial neural network (ANN) to a form that is suitable for neuromorphic hardware implementation. Here, we tackle transforming Long Short-Term Memory (LSTM), a version of recurrent neural network (RNN) which includes recurrent connectivity to enable learning long temporal patterns. This is specifically a difficult challenge due to the inherent nature of RNNs and SNNs; the recurrent connectivity in RNNs induces temporal dynamics which require synchronicity, especially with the added complexity of LSTMs; and SNNs are asynchronous in nature. In addition, the constraints of the neuromorphic hardware provided a massive challenge for this realization. Thus, in this work, we invented a store-and-release circuit using integrate-and-fire neurons which allows the synchronization and then developed modules using that circuit to replicate various parts of the LSTM. These modules enabled implementation of LSTMs with spiking neurons on IBM\u27s TrueNorth Neurosynaptic processor. This is the first work to realize such LSTM networks utilizing spiking neurons and implement on a neuromorphic hardware. This opens avenues for the use of neuromorphic hardware in applications involving temporal patterns. Moving from mapping a pretrained ANN, we work on training networks on the neuromorphic hardware. Here, we first looked at the biologically plausible learning algorithm called STDP which is a Hebbian learning rule for learning without supervision. Simplified computational interpretations of STDP is either unstable and/or complex such that it is costly to implement on hardware. Thus, in this work, we proposed a stable version of STDP and applied intentional approximations for low-cost hardware implementation called Quantized 2-Power Shift (Q2PS) rule. With this version, we performed both unsupervised learning for feature extraction and supervised learning for classification in a multilayer SNN to achieve comparable to better accuracy on MNIST dataset compared to manually labelled two-layered networks. Next, we approached training multilayer SNNs on a neuromorphic hardware with backpropagation, a gradient-based optimization algorithm that forms the backbone of deep neural networks (DNN). Although STDP is biologically plausible, its not as robust for learning deep networks as backpropagation is for DNNs. However, backpropagation is not biologically plausible and not suitable to be directly applied to SNNs, neither can it be implemented on a neuromorphic hardware. Thus, in the first part of this work, we devise a set of approximations to transform backprogation to the spike domain such that it is suitable for SNNs. After the set of approximations, we adapted the connectivity and weight update rule in backpropagation to enable learning solely based on the locally available information such that it resembled a rate-based STDP algorithm. We called this Error-Modulated STDP (EMSTDP). In the next part of this work, we implemented EMSTDP on Intel\u27s Loihi neuromorphic chip to realize online in-hardware supervised learning of deep SNNs. This is the first realization of a fully spike-based approximation of backpropagation algorithm implemented on a neuromorphic processor. This is the first step towards building an autonomous machine that learns continuously from its environment and experiences
    • …
    corecore