6 research outputs found

    Stochastic Computing Correlation Utilization in Convolutional Neural Network Basic Functions

    Get PDF
    In recent years, many applications have been implemented in embedded systems and mobile Internet of Things (IoT) devices that typically have constrained resources, smaller power budget, and exhibit "smartness" or intelligence. To implement computation-intensive and resource-hungry Convolutional Neural Network (CNN) in this class of devices, many research groups have developed specialized parallel accelerators using Graphical Processing Units (GPU), Field-Programmable Gate Arrays (FPGA), or Application-Specific Integrated Circuits (ASIC). An alternative computing paradigm called Stochastic Computing (SC) can implement CNN with low hardware footprint and power consumption. To enable building more efficient SC CNN, this work incorporates the CNN basic functions in SC that exploit correlation, share Random Number Generators (RNG), and is more robust to rounding error. Experimental results show our proposed solution provides significant savings in hardware footprint and increased accuracy for the SC CNN basic functions circuits compared to previous work

    Low-Cost Deep Convolutional Neural Network Acceleration with Stochastic Computing and Quantization

    Get PDF
    Department of Computer Science and EngineeringFor about a decade, image classification performance leaded by deep convolutional neural networks (DCNNs) has achieved dramatic advancement. However, its excessive computational complexity requires much hardware cost and energy. Accelerators which consist of a many-core neural processing unit are appearing to compute DCNNs energy efficiently than conventional processors (e.g., CPUs and GPUs). However, a huge amount of general-purpose precision computations is still tough for mobile and edge devices. Therefore, there have been many researches to simplify DCNN computations, especially multiply-accumulate (MAC) operations that account for most processing time. Apart from conventional binary computing and as a promising alternative, stochastic computing (SC) was studied steadily for low-cost arithmetic operations. However, previous SC-DCNN approaches have critical limitations such as lack of scalability and accuracy loss. This dissertation first offers solutions to overcome those problems. Furthermore, SC has additional advantages over binary computing such as error tolerance. Those strengths are exploited and assessed in the dissertation. Meanwhile, quantization which replaces high precision dataflow by low-bit representation and arithmetic operations becomes popular for reduction of DCNN model size and computation cost. Currently, low-bit fixed-point representation is popularly used. The dissertation argues that SC and quantization are mutually beneficial. In other words, efficiency of SC-DCNN can be improved by usual quantization as the conventional binary computing does and a flexible SC feature can exploit quantization more effectively than the binary computing. Besides, more advanced quantization methods are emerging. In accordance with those, novel SC-MAC structures are devised to attain the benefits. For each contribution, RTL implemented SC accelerators are evaluated and compared with conventional binary implementations. Also, a small FPGA prototype demonstrates the viability of SC-DCNN. In a rapidly changing and developing deep learning world headed by conventional binary computing, multifariously enhanced SC, though not as popular as binary, is still competitive implementation with its own benefits.ope

    Scalable stochastic-computing accelerator for convolutional neural networks

    No full text
    Stochastic Computing (SC) is an alternative design paradigm particularly useful for applications where cost is critical. SC has been applied to neural networks, as neural networks are known for their high computational complexity. However previous work in this area has critical limitations such as the fully-parallel architecture assumption, which prevent them from being applicable to recent ones such as convolutional neural networks, or ConvNets. This paper presents the first SC architecture for ConvNets, shows its feasibility, with detailed analyses of implementation overheads. Our SC-ConvNet is a hybrid between SC and conventional binary design, which is a marked difference from earlier SC-based neural networks. Though this might seem like a compromise, it is a novel feature driven by the need to support modern ConvNets at scale, which commonly have many, large layers. Our proposed architecture also features hybrid layer composition, which helps achieve very high recognition accuracy. Our detailed evaluation results involving functional simulation and RTL synthesis suggest that SC-ConvNets are indeed competitive with conventional binary designs, even without considering inherent error resilience of SC

    New Views for Stochastic Computing: From Time-Encoding to Deterministic Processing

    Get PDF
    University of Minnesota Ph.D. dissertation.July 2018. Major: Electrical/Computer Engineering. Advisor: David Lilja. 1 computer file (PDF); xi, 149 pages.Stochastic computing (SC), a paradigm first introduced in the 1960s, has received considerable attention in recent years as a potential paradigm for emerging technologies and ''post-CMOS'' computing. Logical computation is performed on random bitstreams where the signal value is encoded by the probability of obtaining a one versus a zero. This unconventional representation of data offers some intriguing advantages over conventional weighted binary. Implementing complex functions with simple hardware (e.g., multiplication using a single AND gate), tolerating soft errors (i.e., bit flips), and progressive precision are the primary advantages of SC. The obvious disadvantage, however, is latency. A stochastic representation is exponentially longer than conventional binary radix. Long latencies translate into high energy consumption, often higher than that of their binary counterpart. Generating bit streams is also costly. Factoring in the cost of the bit-stream generators, the overall hardware cost of an SC implementation is often comparable to a conventional binary implementation. This dissertation begins by proposing a highly unorthodox idea: performing computation with digital constructs on time-encoded analog signals. We introduce a new, energy-efficient, high-performance, and much less costly approach for SC using time-encoded pulse signals. We explore the design and implementation of arithmetic operations on time-encoded data and discuss the advantages, challenges, and potential applications. Experimental results on image processing applications show up to 99% performance speedup, 98% saving in energy dissipation, and 40% area reduction compared to prior stochastic implementations. We further introduce a low-cost approach for synthesizing sorting network circuits based on deterministic unary bit-streams. Synthesis results show more than 90% area and power savings compared to the costs of the conventional binary implementation. Time-based encoding of data is then exploited for fast and energy-efficient processing of data with the developed sorting circuits. Poor progressive precision is the main challenge with the recently developed deterministic methods of SC. We propose a high-quality down-sampling method which significantly improves the processing time and the energy consumption of these deterministic methods by pseudo-randomizing bitstreams. We also propose two novel deterministic methods of processing bitstreams by using low-discrepancy sequences. We further introduce a new advantage to SC paradigm-the skew tolerance of SC circuits. We exploit this advantage in developing polysynchronous clocking, a design strategy for optimizing the clock distribution network of SC systems. Finally, as the first study of its kind to the best of our knowledge, we rethink the memory system design for SC. We propose a seamless stochastic system, StochMem, which features analog memory to trade the energy and area overhead of data conversion for computation accuracy
    corecore