5 research outputs found

    Efficient architectures and implementation of arithmetic functions approximation based stochastic computing

    Get PDF
    Stochastic computing (SC) has emerged as a potential alternative to binary computing for a number of low-power embedded systems, DSP, neural networks and communications applications. In this paper, a new method, associated architectures and implementations of complex arithmetic functions, such as exponential, sigmoid and hyperbolic tangent functions are presented. Our approach is based on a combination of piecewise linear (PWL) approximation as well as a polynomial interpolation based (Lagrange interpolation) methods. The proposed method aims at reducing the number of binary to stochastic converters. This is the most power sensitive module in an SC system. The hardware implementation for each complex arithmetic function is then derived using the 65nm CMOS technology node. In terms of accuracy, the proposed approach outperforms other well-known methods by 2 times on average. The power consumption of the implementations based on our method is decreased on average by 40 % comparing to other previous solutions. Additionally, the hardware complexity of our proposed method is also improved (40 % on average) while the critical path of the proposed method is slightly increased by 2.5% on average when comparing to other methods

    Low complexity MIMO detection algorithms and implementations

    Get PDF
    University of Minnesota Ph.D. dissertation. December 2014. Major: Electrical Engineering. Advisor: Gerald E. Sobelman. 1 computer file (PDF); ix, 111 pages.MIMO techniques use multiple antennas at both the transmitter and receiver sides to achieve diversity gain, multiplexing gain, or both. One of the key challenges in exploiting the potential of MIMO systems is to design high-throughput, low-complexity detection algorithms while achieving near-optimal performance. In this thesis, we design and optimize algorithms for MIMO detection and investigate the associated performance and FPGA implementation aspects.First, we study and optimize a detection algorithm developed by Shabany and Gulak for a K-Best based high throughput and low energy hard output MIMO detection and expand it to the complex domain. The new method uses simple lookup tables, and it is fully scalable for a wide range of K-values and constellation sizes. This technique reduces the computational complexity, without sacrificing performance and the complexity scales only sub-linearly with the constellation size. Second, we apply the bidirectional technique to trellis search and propose a high performance soft output bidirectional path preserving trellis search (PPTS) detector for MIMO systems. The comparative error analysis between single direction and bidirectional PPTS detectors is given. We demonstrate that the bidirectional PPTS detector can minimize the detection error. Next, we design a novel bidirectional processing algorithm for soft-output MIMO systems. It combines features from several types of fixed complexity tree search procedures. The proposed approach achieves a higher performance than previously proposed algorithms and has a comparable computational cost. Moreover, its parallel nature and fixed throughput characteristics make it attractive for very large scale integration (VLSI) implementation.Following that, we present a novel low-complexity hard output MIMO detection algorithm for LTE and WiFi applications. We provide a well-defined tradeoff between computational complexity and performance. The proposed algorithm uses a much smaller number of Euclidean distance (ED) calculations while attaining only a 0.5dB loss compared to maximum likelihood detection (MLD). A 3x3 MIMO system with a 16QAM detector architecture is designed, and the latency and hardware costs are estimated.Finally, we present a stochastic computing implementation of trigonometric and hyperbolic functions which can be used for QR decomposition and other wireless communications and signal processing applications

    The Logic of Random Pulses: Stochastic Computing.

    Full text link
    Recent developments in the field of electronics have produced nano-scale devices whose operation can only be described in probabilistic terms. In contrast with the conventional deterministic computing that has dominated the digital world for decades, we investigate a fundamentally different technique that is probabilistic by nature, namely, stochastic computing (SC). In SC, numbers are represented by bit-streams of 0's and 1's, in which the probability of seeing a 1 denotes the value of the number. The main benefit of SC is that complicated arithmetic computation can be performed by simple logic circuits. For example, a single (logic) AND gate performs multiplication. The dissertation begins with a comprehensive survey of SC and its applications. We highlight its main challenges, which include long computation time and low accuracy, as well as the lack of general design methods. We then address some of the more important challenges. We introduce a new SC design method, called STRAUSS, that generates efficient SC circuits for arbitrary target functions. We then address the problems arising from correlation among stochastic numbers (SNs). In particular, we show that, contrary to general belief, correlation can sometimes serve as a resource in SC design. We also show that unlike conventional circuits, SC circuits can tolerate high error rates and are hence useful in some new applications that involve nondeterministic behavior in the underlying circuitry. Finally, we show how SC's properties can be exploited in the design of an efficient vision chip that is suitable for retinal implants. In particular, we show that SC circuits can directly operate on signals with neural encoding, which eliminates the need for data conversion.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113561/1/alaghi_1.pd

    Designing Accurate and Low-Cost Stochastic Circuits.

    Full text link
    Stochastic computing (SC) is an unconventional computing approach that processes data represented by pseudo-random bit-streams called stochastic numbers (SNs). It enables arithmetic functions to be implemented by tiny, low-power logic circuits, and is highly error-tolerant. These properties make SC practical for applications that need massive parallelism or operate in noisy environments where conventional binary designs are too costly or too unreliable. SC has recently come to be seen as an attractive choice for tasks such as biomedical image processing and decoding complex error-correcting codes. Despite its desirable properties, SC has features that limit its usefulness, including insufficient accuracy and an inadequate design theory. Accuracy is especially vulnerable to correlation among interacting SNs and to the random fluctuations inherent in SC’s data representation. This dissertation examines the major factors affecting accuracy using analytical and experimental approaches based on probability theory and circuit simulation, respectively. We devise methods to quantify the error effects in stochastic circuits by means of probabilistic transfer matrices and Bernouilli processes. These methods make it possible to compare the impact of errors on conventional and stochastic circuits under various conditions. We then analyze correlation in detail and show that correlation-induced errors can be reduced by the careful insertion of delay elements, a de-correlation technique called isolation. Noting that different logic functions can have the same stochastic behavior when constant SNs are applied to their inputs, we show how to partition logic functions into stochastic equivalence classes (SECs). We derive a procedure for identifying SECs, and apply SEC concepts to the synthesis and optimization of stochastic circuits. While addition, subtraction and multiplication have well-known and simple SC implementations, this is not true for division. We study stochastic division methods and propose a new type of stochastic divider that combines low cost with high accuracy. Finally, we turn to the design of general stochastic circuits and investigate a desirable property of SNs called monotonic progressive precision (MPP) whereby accuracy increases steadily with bit-stream length. We develop an SC design technique which produces results that are accurate and have good MPP. The dissertation concludes with some ideas for future research.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133255/1/tehsuan_1.pd

    2.5D Chiplet Architecture for Embedded Processing of High Velocity Streaming Data

    Get PDF
    This dissertation presents an energy efficient 2.5D chiplet-based architecture for real-time probabilistic processing of high-velocity sensor data, from an autonomous real-time ubiquitous surveillance imaging system. This work addresses problems at all levels of description. At the lowest physical level, new standard cell libraries have been developed for ultra-low voltage CMOS synthesis, as well as custom SRAM memory blocks, and mixed-signal physical true random number generators based on the perturbation of Sigma-Delta structures using random telegraph noise (RTN) in single transistor devices. At the chip level architecture, an innovative compact buffer-less switched circuit mesh network on chip (NoC) capable of reaching very high throughput (1.6Tbps), finite packet delay delivery, free from packet dropping, and free from dead-locks and live-locks, was designed for this chiplet-based solution. Additionally, a second NoC connecting processors in the network, was implemented based on token-rings, allowing access to external DDR memory. Furthermore, a new clock tree distribution network, and a wide bandwidth DRAM physical interface have been designed to address the data flow requirements within and across chiplets. At the algorithm and representation levels, the Online Change Point Detection (CPD) algorithm has been implemented for on-line learning of background-foreground segmentation. Instead of using traditional binary representation of numbers, this architecture relies on unconventional processing of signals using a bio-inspired (spike-based) unary representation of numbers, where these numbers are represented in a stochastic stream of Bernoulli random variables. By using this representation, probabilistic algorithms can be executed in a native architecture with precision on demand, where if more accuracy is required, more computational time and power can be allocated. The SoC chiplet architecture has been extensively simulated and validated using state of the art CAD methodology, and has been submitted to fabrication in a dedicated 55nm GF CMOS technology wafer run. Experimental results from fabricated test chips in the same technology are also presented
    corecore