226 research outputs found

    Digital Signal Processing and Machine Learning System Design using Stochastic Logic

    Get PDF
    University of Minnesota Ph.D. dissertation. July 2017. Major: Electrical/Computer Engineering. Advisor: Keshab Parhi. 1 computer file (PDF); xxii, 172 pages.Digital signal processing (DSP) and machine learning systems play a crucial role in the fields of big data and artificial intelligence. The hardware design of these systems is extremely critical to meet stringent application requirements such as extremely small size, low power consumption, and high reliability. Following the path of Moore's Law, the density and performance of hardware systems are dramatically improved at an exponential pace. The increase in the number of transistors on a chip, which plays the main role in improvement in the density of circuit design, causes rapid increase in circuit complexity. Therefore, low area consumption is one of the key challenges for IC design, especially for portable devices. Another important challenge for hardware design is reliability. A chip fabricated using nanoscale complementary metal-oxide-semiconductor (CMOS) technologies will be prone to errors caused by fluctuations in threshold voltage, supply voltage, doping levels, aging, timing errors and soft errors. Design of nanoscale failure-resistant systems is currently of significant interest, especially as the technology scales below 10 nm. Stochastic Computing (SC) is a novel approach to address these challenges in system and circuit design. This dissertation considers the design of digital signal processing and machine learning systems in stochastic logic. The stochastic implementations of finite impulse response (FIR) and infinite impulse response (IIR) filters based on various lattice structures are presented. The implementations of complex functions such as trigonometric, exponential, and sigmoid, are derived based on truncated versions of their Maclaurin series expansions. We also present stochastic computation of polynomials using stochastic subtractors and factorization. The machine learning systems including artificial neural network (ANN) and support vector machine (SVM) in stochastic logic are also presented. First, we propose novel implementations for linear-phase FIR filters in stochastic logic. The proposed design is based on lattice structures. Compared to direct-form linear-phase FIR filters, linear-phase lattice filters require twice the number of multipliers but the same number of adders. The hardware complexities of stochastic implementations of linear-phase FIR filters for direct-form and lattice structures are comparable. We propose stochastic implementation of IIR filters using lattice structures where the states are orthogonal and uncorrelated. We present stochastic IIR filters using basic, normalized and modified lattice structures. Simulation results demonstrate high signal-to-error ratio and fault tolerance in these structures. Furthermore, hardware synthesis results show that these filter structures require lower hardware area and power compared to two's complement realizations. Second, We present stochastic logic implementations of complex arithmetic functions based on truncated versions of their Maclaurin series expansions. It is shown that a polynomial can be implemented using multiple levels of NAND gates based on Horner's rule, if the coefficients are alternately positive and negative and their magnitudes are monotonically decreasing. Truncated Maclaurin series expansions of arithmetic functions are used to generate polynomials which satisfy these constraints. The input and output in these functions are represented by unipolar representation. For a polynomial that does not satisfy these constraints, it still can be implemented based on Horner's rule if each factor of the polynomial satisfies these constraints. format conversion is proposed for arithmetic functions with input and output represented in different formats, such as cosโ€‰ฯ€x\text{cos}\,\pi x given xโˆˆ[0,1]x\in[0,1] and sigmoid(x)\text{sigmoid(x)} given xโˆˆ[โˆ’1,1]x\in[-1,1]. Polynomials are transformed to equivalent forms that naturally exploit format conversions. The proposed stochastic logic circuits outperform the well-known Bernstein polynomial based and finite-state-machine (FSM) based implementations. Furthermore, the hardware complexity and the critical path of the proposed implementations are less than the Bernstein polynomial based and FSM based implementations for most cases. Third, we address subtraction and polynomial computations using unipolar stochastic logic. It is shown that stochastic computation of polynomials can be implemented by using a stochastic subtractor and factorization. Two approaches are proposed to compute subtraction in stochastic unipolar representation. In the first approach, the subtraction operation is approximated by cascading multi-levels of OR and AND gates. The accuracy of the approximation is improved with the increase in the number of stages. In the second approach, the stochastic subtraction is implemented using a multiplexer and a stochastic divider. We propose stochastic computation of polynomials using factorization. Stochastic implementations of first-order and second-order factors are presented for different locations of polynomial roots. From experimental results, it is shown that the proposed stochastic logic circuits require less hardware complexity than the previous stochastic polynomial implementation using Bernstein polynomials. Finally, this thesis presents novel architectures for machine learning based classifiers using stochastic logic. Three types of classifiers are considered. These include: linear support vector machine (SVM), artificial neural network (ANN) and radial basis function (RBF) SVM. These architectures are validated using seizure prediction from electroencephalogram (EEG) as an application example. To improve the accuracy of proposed stochastic classifiers, an approach of data-oriented linear transform for input data is proposed for EEG signal classification using linear SVM classifiers. Simulation results in terms of the classification accuracy are presented for the proposed stochastic computing and the traditional binary implementations based datasets from two patients. It is shown that accuracies of the proposed stochastic linear SVM are improved by 3.88\% and 85.49\% for datasets from patient-1 and patient-2, respectively, by using the proposed linear-transform for input data. Compared to conventional binary implementation, the accuracy of the proposed stochastic ANN is improved by 5.89\% for the datasets from patient-1. For patient-2, the accuracy of the proposed stochastic ANN is improved by 7.49\% by using the proposed linear-transform for input data. Additionally, compared to the traditional binary linear SVM and ANN, the hardware complexity, power consumption and critical path of the proposed stochastic implementations are reduced significantly

    The Logic of Random Pulses: Stochastic Computing.

    Full text link
    Recent developments in the field of electronics have produced nano-scale devices whose operation can only be described in probabilistic terms. In contrast with the conventional deterministic computing that has dominated the digital world for decades, we investigate a fundamentally different technique that is probabilistic by nature, namely, stochastic computing (SC). In SC, numbers are represented by bit-streams of 0's and 1's, in which the probability of seeing a 1 denotes the value of the number. The main benefit of SC is that complicated arithmetic computation can be performed by simple logic circuits. For example, a single (logic) AND gate performs multiplication. The dissertation begins with a comprehensive survey of SC and its applications. We highlight its main challenges, which include long computation time and low accuracy, as well as the lack of general design methods. We then address some of the more important challenges. We introduce a new SC design method, called STRAUSS, that generates efficient SC circuits for arbitrary target functions. We then address the problems arising from correlation among stochastic numbers (SNs). In particular, we show that, contrary to general belief, correlation can sometimes serve as a resource in SC design. We also show that unlike conventional circuits, SC circuits can tolerate high error rates and are hence useful in some new applications that involve nondeterministic behavior in the underlying circuitry. Finally, we show how SC's properties can be exploited in the design of an efficient vision chip that is suitable for retinal implants. In particular, we show that SC circuits can directly operate on signals with neural encoding, which eliminates the need for data conversion.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113561/1/alaghi_1.pd

    A Framework for Computing Discrete-Time Systems and Functions using DNA

    Get PDF
    University of Minnesota Ph.D. dissertation. July 2017. Major: Electrical/Computer Engineering. Advisors: Keshab Parhi, Marc Riedel. 1 computer file (PDF); xvii, 216 pages.Due to the recent advances in the field of synthetic biology, molecular computing has emerged as a non-conventional computing technology. A broad range of computational processes has been considered for molecular implementation. In this dissertation, we investigate the development of molecular systems for performing the following computations: signal processing, Markov chains, polynomials, and mathematical functions. First, we present a \textit{fully asynchronous} framework to design molecular signal processing algorithms. The framework maps each delay unit to two molecular types, i.e., first-type and second-type, and provides a 4-phase scheme to synchronize data flow for any multi-input/multi-output signal processing system. In the first phase, the input signal and values stored in all delay elements are consumed for computations. Results of computations are stored in the first-type molecules corresponding to the delay units and output variables. During the second phase, the values of the first-type molecules are transferred to the second-type molecules for the output variable. In the third phase, the concentrations of the first-type molecules are transferred to the second-type molecules associated with each delay element. Finally, in the fourth phase, the output molecules are collected. The method is illustrated by synthesizing a simple finite-impulse response (FIR) filter, an infinite-impulse response (IIR) filter, and an 8-point real-valued fast Fourier transform (FFT). The simulation results show that the proposed framework provides faster molecular signal processing systems compared to prior frameworks. We then present an overview of how continuous-time, discrete-time and digital signal processing systems can be implemented using molecular reactions. We also present molecular sensing systems where molecular reactions are used to implement analog-to-digital converters (ADCs) and digital-to-analog converters (DACs). These converters can be used to design mixed-signal processing molecular systems. A complete example of the addition of two molecules using digital implementation is described where the concentrations of two molecules are converted to digital by two 3-bit ADCs, and the 4-bit output of the digital adder is converted to analog by a 4-bit DAC. Furthermore, we describe implementation of other forms of molecular computation. We propose an approach to implement any first-order Markov chain using molecular reactions in general and DNA in particular. The Markov chain consists of two parts: a set of states and state transition probabilities. Each state is modeled by a unique molecular type, referred to as a data molecule. Each state transition is modeled by a unique molecular type, referred to as a control molecule, and a unique molecular reaction. Each reaction consumes data molecules of one state and produces data molecules of another state. The concentrations of control molecules are initialized according to the probabilities of corresponding state transitions in the chain. The steady-state probability of the Markov chain is computed by the equilibrium concentration of data molecules. We demonstrate our method for the Gamblerโ€™s Ruin problem as an instance of the Markov chain process. We analyze the method according to both the stochastic chemical kinetics and the mass-action kinetics model. Additionally, we propose a novel {\em unipolar molecular encoding} approach to compute a certain class of polynomials. In this molecular encoding, each variable is represented using two molecular types: a \mbox{type-0} and a \mbox{type-1}. The value is the ratio of the concentration of type-1 molecules to the sum of the concentrations of \mbox{type-0} and \mbox{type-1} molecules. With the new encoding, CRNs can compute any set of polynomial functions subject only to the limitation that these polynomials can be expressed as linear combinations of Bernstein basis polynomials with positive coefficients less than or equal to 1. The proposed encoding naturally exploits the expansion of a power-form polynomial into a Bernstein polynomial. We present molecular encoders for converting any input in a standard representation to the fractional representation, as well as decoders for converting the computed output from the fractional to a standard representation. Lastly, we expand the unipolar molecular encoding for bipolar molecular encoding and propose simple molecular circuits that can compute multiplication and scaled addition. Using these circuits, we design molecular circuits to compute more complex mathematical functions such as eโˆ’xe^{-x}, sinโก(x)\sin (x), and sigmoid(x)(x). According to this approach, we implement a molecular perceptron as a simple artificial neural network

    Efficient architectures and implementation of arithmetic functions approximation based stochastic computing

    Get PDF
    Stochastic computing (SC) has emerged as a potential alternative to binary computing for a number of low-power embedded systems, DSP, neural networks and communications applications. In this paper, a new method, associated architectures and implementations of complex arithmetic functions, such as exponential, sigmoid and hyperbolic tangent functions are presented. Our approach is based on a combination of piecewise linear (PWL) approximation as well as a polynomial interpolation based (Lagrange interpolation) methods. The proposed method aims at reducing the number of binary to stochastic converters. This is the most power sensitive module in an SC system. The hardware implementation for each complex arithmetic function is then derived using the 65nm CMOS technology node. In terms of accuracy, the proposed approach outperforms other well-known methods by 2 times on average. The power consumption of the implementations based on our method is decreased on average by 40 % comparing to other previous solutions. Additionally, the hardware complexity of our proposed method is also improved (40 % on average) while the critical path of the proposed method is slightly increased by 2.5% on average when comparing to other methods

    ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer

    Full text link
    Stochastic computing (SC) has emerged as a promising computing paradigm for neural acceleration. However, how to accelerate the state-of-the-art Vision Transformer (ViT) with SC remains unclear. Unlike convolutional neural networks, ViTs introduce notable compatibility and efficiency challenges because of their nonlinear functions, e.g., softmax and Gaussian Error Linear Units (GELU). In this paper, for the first time, a ViT accelerator based on end-to-end SC, dubbed ASCEND, is proposed. ASCEND co-designs the SC circuits and ViT networks to enable accurate yet efficient acceleration. To overcome the compatibility challenges, ASCEND proposes a novel deterministic SC block for GELU and leverages an SC-friendly iterative approximate algorithm to design an accurate and efficient softmax circuit. To improve inference efficiency, ASCEND develops a two-stage training pipeline to produce accurate low-precision ViTs. With extensive experiments, we show the proposed GELU and softmax blocks achieve 56.3% and 22.6% error reduction compared to existing SC designs, respectively and reduce the area-delay product (ADP) by 5.29x and 12.6x, respectively. Moreover, compared to the baseline low-precision ViTs, ASCEND also achieves significant accuracy improvements on CIFAR10 and CIFAR100.Comment: Accepted in DATE 202

    ๊ธฐ๊ณ„ํ•™์Šต ์‹œ์Šคํ…œ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2016. 2. ์ตœ๊ธฐ์˜.Machine learning has been paid attention because intelligence such as recognition, decision making, and recommendation is a helpful utility in industrial, medical, transportation, entertainment systems, and others that human need to interact with. As machine learning techniques are extensively applied to various areas, the needs for more robust algorithms and more efficient hardware have been increased. In order to develop an efficient machine learning system, we have researched from high-level algorithm down to low-level hardware logicthe main focus of our work is on ensemble machine learning and stochastic computing (SC). The first work is to combine multiple components, i.e., multiple feature extractors (FE) and multiple classifiers in the aspect of pattern recognition. Ensemble of multiple components is one of challenging approaches for constructing a more accurate classifier. It can handle difficult problems where a single classifier easily makes a wrong decision due to lack of training or parameter optimization. Combining the decisions of participating classifiers statistically reduces the risk of wrong decision. We suggest a hierarchical ensemble framework of multiple feature extractors and multiple classifiers (MFMC). The second work is to construct efficient hardware building blocks for machine learning in order to reduce system complexity and generate high area- and energy-efficient logic, where we exploit the property of machine learning systems that does not require accurate computations. We select stochastic computing (SC), which is an alternative paradigm to conventional binary arithmetic computing. SC can boost efficiency in terms of area, power, and error tolerance, while relaxing the accuracy of computation. The third work is to combine both machine learning and stochastic computing, where we select deep learning. This work presents an efficient DNN design with stochastic computing. Observing that directly adopting stochastic computing to DNN has some challenges including random error fluctuation, range limitation, and overhead in accumulation, we address these problems by removing near-zero weights, applying weight-scaling, and integrating the activation function with the accumulator. The approach allows an easy implementation of early decision termination with a fixed hardware design by exploiting the progressive precision characteristics of stochastic computing, which was not easy with existing approaches. Experimental results show that our approach outperforms the conventional binary logic in terms of gate area, latency, and power consumption.1. Introduction 1 1.1 Hierarchical Ensemble Learning Framework 1 1.2 Hardware Building Block for Machine Learning By Using Stochastic Computing 1 1.2.1 Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks 5 2. A Design Framework for Hierarchical Ensemble of Multiple Feature Extractors and Multiple Classifiers 7 2.1 Introduction 7 2.2 Related work 9 2.3 Proposed hierarchical ensemble system 12 2.3.1 Local Mapping Block and Global Mapping Block 12 2.3.2 Complexity comparison according to composition of LMB 15 2.3.3 Motivation for differentiating local and global mappings17 2.3.4 Reinforcement learning for LMB 19 2.3.5 Construction of Bayesian network from GMB 24 2.4 Experimental results 32 2.4.1 Measure of effectiveness for WMV and RL 33 2.4.2 Pedestrian detection dataset 35 2.4.3 Comparison between GMB and AdaBoost 41 2.4.4 UCI Multiple Features dataset 42 2.4.5 LMB selection 44 2.4.6 Discussion 45 2.5 Conclusion 46 3. Synthesis of Efficient Stochastic Logic for Many-Variable Expressions 49 3.1 Introduction 49 3.2 Related Work 52 3.3 SC Logic Synthesis for Multivariate Expressions 54 3.3.1 Probabilistic Logic 55 3.3.2 Definitions 58 3.3.3 Overview of the Proposed Method 60 3.3.4 Direct Synthesis VS. Kernel-based Synthesis 60 3.3.5 SC Kernel 63 3.3.6 Prime SC Kernel 65 3.3.7 iSC Kernel 68 3.3.8 Relationship Between iSC Kernels 70 3.3.9 Hybrid Scheme 75 3.3.10 Cost Function 76 3.3.11 SC Synthesis Algorithm 78 3.4 Experimental Results 82 3.4.1 Performance of SC Logic Synthesis Algorithm 83 3.4.2 Quality of Synthesis Results 84 3.4.3 Comparison of Accuracy 89 3.5 Conclusion 90 4. An Energy-Efficient Random Number Generator for Stochastic Circuits 91 4.1 Introduction 91 4.2 II. Background 92 4.2.1 Preliminaries 92 4.2.2 Shortcomings of Conventional Approaches 93 4.3 III. Proposed Stochastic Number Generator 96 4.3.1 Overview of the Proposed SNG 96 4.3.2 Even-distribution Encoding 96 4.3.3 Inter-group Randomization 98 4.3.4 Proposed Building Block for Bit Shuffling 100 4.3.5 Intra-group Randomization 102 4.4 Experimental Results 103 4.4.1 Accuracy of Generated Stochastic Bit Stream 104 4.4.2 Area, Delay, Power, Energy and SCC Average 104 4.4.3 Energy Efficiency When Operated under Maximal Precision 105 4.5 Conclusion 106 5. Approximate De-randomizer for Stochastic Circuits 107 5.1 Introduction 107 5.2 Proposed Approximate Parallel Counter 108 5.2.1 Analysis for Gate Count in 1-layer Approximate PC 109 5.2.2 Analysis for Error in 1-layer Approximate PC 110 5.3 Experimental Results 111 5.4 Conclusion 112 6. Dynamic Energy-Accuracy Trade-off Using Stochastic Computing in Deep Neural Networks 113 6.1 Introduction 113 6.2 Background 115 6.4 DNN Using Stochastic Circuit 117 6.4.1 Overview of the Proposed DNN using SC 117 6.4.2 Removing Near-Zero Weights 119 6.4.3 Applying Weight Scaling 120 6.4.4 Activation Function with Accumulation 121 6.5 Early Decision Termination 125 6.5.1 Moving Average Tracking Output Trends 126 6.6 Experimental Results 127 6.6.1 Accuracy of DNN Using SC 128 6.6.2 Effectiveness of Early Decision Termination 129 6.6.3 Comparison of Synthesis Results 130 6.7 Conclusion 132 7. Conclusion 134 Bibliography 136 ์š”์•ฝ(๊ตญ๋ฌธ์ดˆ๋ก) 144Docto
    • โ€ฆ
    corecore