9 research outputs found
Skyrmion Gas Manipulation for Probabilistic Computing
The topologically protected magnetic spin configurations known as skyrmions
offer promising applications due to their stability, mobility and localization.
In this work, we emphasize how to leverage the thermally driven dynamics of an
ensemble of such particles to perform computing tasks. We propose a device
employing a skyrmion gas to reshuffle a random signal into an uncorrelated copy
of itself. This is demonstrated by modelling the ensemble dynamics in a
collective coordinate approach where skyrmion-skyrmion and skyrmion-boundary
interactions are accounted for phenomenologically. Our numerical results are
used to develop a proof-of-concept for an energy efficient
() device with a low area imprint ().
Whereas its immediate application to stochastic computing circuit designs will
be made apparent, we argue that its basic functionality, reminiscent of an
integrate-and-fire neuron, qualifies it as a novel bio-inspired building block.Comment: 41 pages, 20 figure
Energy-Performance Scalability Analysis of a Novel Quasi-Stochastic Computing Approach
Stochastic computing (SC) is an emerging low-cost computation paradigm for efficient approximation. It processes data in forms of probabilities and offers excellent progressive accuracy. Since SC\u27s accuracy heavily depends on the stochastic bitstream length, generating acceptable approximate results while minimizing the bitstream length is one of the major challenges in SC, as energy consumption tends to linearly increase with bitstream length. To address this issue, a novel energy-performance scalable approach based on quasi-stochastic number generators is proposed and validated in this work. Compared to conventional approaches, the proposed methodology utilizes a novel algorithm to estimate the computation time based on the accuracy. The proposed methodology is tested and verified on a stochastic edge detection circuit to showcase its viability. Results prove that the proposed approach offers a 12—60% reduction in execution time and a 12—78% decrease in the energy consumption relative to the conventional counterpart. This excellent scalability between energy and performance could be potentially beneficial to certain application domains such as image processing and machine learning, where power and time-efficient approximation is desired
Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing
The high performance of FPGA (Field Programmable Gate Array) in image processing applications is justified by its flexible reconfigurability, its inherent parallel nature and the availability of a large amount of internal memories. Lately, the Stochastic Computing (SC) paradigm has been found to be significantly advantageous in certain application domains including image processing because of its lower hardware complexity and power consumption. However, its viability is deemed to be limited due to its serial bitstream processing and excessive run-time requirement for convergence. To address these issues, a novel approach is proposed in this work where an energy-efficient implementation of SC is accomplished by introducing fast-converging Quasi-Stochastic Number Generators (QSNGs) and parallel stochastic bitstream processing, which are well suited to leverage FPGA\u27s reconfigurability and abundant internal memory resources. The proposed approach has been tested on the Virtex-4 FPGA, and results have been compared with the serial and parallel implementations of conventional stochastic computation using the well-known SC edge detection and multiplication circuits. Results prove that by using this approach, execution time, as well as the power consumption are decreased by a factor of 3.5 and 4.5 for the edge detection circuit and multiplication circuit, respectively
Novel approaches for efficient stochastic computing
This thesis is comprised of two papers, where the first paper presents a novel approach for parallel implementation of SC using FPGA (Field Programmable Gate Array). This paper makes use of the distributed memory elements of FPGAs (i.e., look-up-tables -LUTs) to achieve this. An attempt has been made to build the stochastic number generators (SNGs) by using the proposed LUT approach. The construction of these SNGs has been influenced by the Quasi-random number sequences, which provide the advantage of reducing the random fluctuations present in the pseudo-random number generators such as LFSR (Linear Feedback Shift Register) as well as the execution time by faster convergence. The results prove that the throughput of the system increases and the execution time is reduced by adopting the proposed technique.
The second paper of the thesis proposes a novel technique referred to as the approximate stochastic computing (ASC) approach focusing on image processing applications to reduce the lengthy computation time of SC with a trade-off in accuracy. The proposed technique is to truncate low-order bits of the image pixel values for SC for faster operation, which also causes an error in the binary to stochastic converted value. Attempts have been made to reduce this error by linearly increasing the clock cycles rather than exponentially. Experimental results from the well-known SC edge detection circuit indicate that the proposed technique is a promising approach for efficient approximate stochastic image processing --Abstract, page iv
Recommended from our members
Hybrid Analog-Digital Co-Processing for Scientific Computation
In the past 10 years computer architecture research has moved to more heterogeneity and less adherence to conventional abstractions. Scientists and engineers hold an unshakable belief that computing holds keys to unlocking humanity's Grand Challenges. Acting on that belief they have looked deeper into computer architecture to find specialized support for their applications. Likewise, computer architects have looked deeper into circuits and devices in search of untapped performance and efficiency. The lines between computer architecture layers---applications, algorithms, architectures, microarchitectures, circuits and devices---have blurred. Against this backdrop, a menagerie of computer architectures are on the horizon, ones that forgo basic assumptions about computer hardware, and require new thinking of how such hardware supports problems and algorithms.
This thesis is about revisiting hybrid analog-digital computing in support of diverse modern workloads. Hybrid computing had extensive applications in early computing history, and has been revisited for small-scale applications in embedded systems. But architectural support for using hybrid computing in modern workloads, at scale and with high accuracy solutions, has been lacking.
I demonstrate solving a variety of scientific computing problems, including stochastic ODEs, partial differential equations, linear algebra, and nonlinear systems of equations, as case studies in hybrid computing. I solve these problems on a system of multiple prototype analog accelerator chips built by a team at Columbia University. On that team I made contributions toward programming the chips, building the digital interface, and validating the chips' functionality. The analog accelerator chip is intended for use in conjunction with a conventional digital host computer.
The appeal and motivation for using an analog accelerator is efficiency and performance, but it comes with limitations in accuracy and problem sizes that we have to work around.
The first problem is how to do problems in this unconventional computation model. Scientific computing phrases problems as differential equations and algebraic equations. Differential equations are a continuous view of the world, while algebraic equations are a discrete one. Prior work in analog computing mostly focused on differential equations; algebraic equations played a minor role in prior work in analog computing. The secret to using the analog accelerator to support modern workloads on conventional computers is that these two viewpoints are interchangeable. The algebraic equations that underlie most workloads can be solved as differential equations,
and differential equations are naturally solvable in the analog accelerator chip. A hybrid analog-digital computer architecture can focus on solving linear and nonlinear algebra problems to support many workloads.
The second problem is how to get accurate solutions using hybrid analog-digital computing. The reason that the analog computation model gives less accurate solutions is it gives up representing numbers as digital binary numbers, and instead uses the full range of analog voltage and current to represent real numbers. Prior work has established that encoding data in analog signals gives an energy efficiency advantage as long as the analog data precision is limited. While the analog accelerator alone may be useful for energy-constrained applications where inputs and outputs are imprecise, we are more interested in using analog in conjunction with digital for precise solutions. This thesis gives novel insight that the trick to do so is to solve nonlinear problems where low-precision guesses are useful for conventional digital algorithms.
The third problem is how to solve large problems using hybrid analog-digital computing. The reason the analog computation model can't handle large problems is it gives up step-by-step discrete-time operation, instead allowing variables to evolve smoothly in continuous time. To make that happen the analog accelerator works by chaining hardware for mathematical operations end-to-end. During computation analog data flows through the hardware with no overheads in control logic and memory accesses. The downside is then the needed hardware size grows alongside problem sizes. While scientific computing researchers have for a long time split large problems into smaller subproblems to fit in digital computer constraints, this thesis is a first attempt to consider these divide-and-conquer algorithms as an essential tool in using the analog model of computation.
As we enter the post-Moore’s law era of computing, unconventional architectures will offer specialized models of computation that uniquely support specific problem types. Two prominent examples are deep neural networks and quantum computers. Recent trends in computer science research show these unconventional architectures will soon have broad adoption. In this thesis I show another specialized, unconventional architecture is to use analog accelerators to solve problems in scientific computing. Computer architecture researchers will discover other important models of computation in the future. This thesis is an example of the discovery process, implementation, and evaluation of how an unconventional architecture supports specialized workloads
The Logic of Random Pulses: Stochastic Computing.
Recent developments in the field of electronics have produced nano-scale devices whose operation can only be described in probabilistic terms. In contrast with the conventional deterministic computing that has dominated the digital world for decades, we investigate a fundamentally different technique that is probabilistic by nature, namely, stochastic computing (SC). In SC, numbers are represented by bit-streams of 0's and 1's, in which the probability of seeing a 1 denotes the value of the number. The main benefit of SC is that complicated arithmetic computation can be performed by simple logic circuits. For example, a single (logic) AND gate performs multiplication. The dissertation begins with a comprehensive survey of SC and its applications. We highlight its main challenges, which include long computation time and low accuracy, as well as the lack of general design methods. We then address some of the more important challenges. We introduce a new SC design method, called STRAUSS, that generates efficient SC circuits for arbitrary target functions. We then address the problems arising from correlation among stochastic numbers (SNs). In particular, we show that, contrary to general belief, correlation can sometimes serve as a resource in SC design. We also show that unlike conventional circuits, SC circuits can tolerate high error rates and are hence useful in some new applications that involve nondeterministic behavior in the underlying circuitry. Finally, we show how SC's properties can be exploited in the design of an efficient vision chip that is suitable for retinal implants. In particular, we show that SC circuits can directly operate on signals with neural encoding, which eliminates the need for data conversion.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113561/1/alaghi_1.pd
Heterogeneous Chip Multiprocessor: Data Representation, Mixed-Signal Processing Tiles, and System Design
With the emergence of big data, the need for more computationally intensive processors that can handle the increased processing demand has risen. Conventional computing paradigms based on the Von Neumann model that separates computational and memory structures have become outdated and less efficient for this increased demand. As the speed and memory density of processors have increased significantly over the years, these models of computing, which rely on a constant stream of data between the processor and memory, see less gains due to finite bandwidth and latency. Moreover, in the presence of extreme scaling, these conventional systems, implemented in submicron integrated circuits, have become even more susceptible to process variability, static leakage current, and more. In this work, alternative paradigms, predicated on distributive processing with robust data representation and mixed-signal processing tiles, are explored for constructing more efficient and scalable computing systems in application specific integrated circuits (ASICs).
The focus of this dissertation work has been on heterogeneous chip multi-processor (CMP) design and optimization across different levels of abstraction. On the level of data representation, a different modality of representation based on random pulse density modulation (RPDM) coding is explored for more efficient processing using stochastic computation. On the level of circuit description, mixed-signal integrated circuits that exploit charge-based computing for energy efficient fixed point arithmetic are designed. Consequently, 8 different chips that test and showcase these circuits were fabricated in submicron CMOS processes. Finally, on the architectural level of description, a compact instruction-set processor and controller that facilitates distributive computing on System-On-Chips (SoCs) is designed. In addition to this, a robust bufferless network architecture is designed with a network simulator, and I/O cells are designed for SoCs.
The culmination of this thesis work has led to the design and fabrication of a heterogeneous chip multi- processor prototype comprised of over 12,000 VVM cores, warp/dewarp processors, cache, and additional processors, which can be applied towards energy efficient large-scale data processing
Design and optimization of approximate multipliers and dividers for integer and floating-point arithmetic
The dawn of the twenty-first century has witnessed an explosion in the number of digital devices and data. While the emerging deep learning algorithms to extract information from this vast sea of data are becoming increasingly compute-intensive, traditional means of improving computing power are no longer yielding gains at the same rate due to the diminishing returns from traditional technology scaling. To minimize the increasing gap between computational demands and the available resources, the paradigm of approximate computing is emerging as one of the potential solutions. Specifically, the resource-efficient approximate arithmetic units promise overall system efficiency, since most of the compute-intensive applications are dominated by arithmetic operations.
This thesis primarily presents design techniques for approximate hardware multipliers and dividers. The thesis presents the design of two approximate integer multipliers and an approximate integer divider. These are: an error-configurable minimally-biased approximate integer multiplier (MBM), an error-configurable reduced-error approximate log based multiplier (REALM), and error-configurable integer divider INZeD. The two multiplier designs and the divider designs are based on the coupling of novel mathematically formulated error-reduction mechanisms in the classical approximate log based multiplier and dividers, respectively. They exhibit very low error bias and offer Pareto-optimal error vs. resource-efficiency trade-offs when compared with the state-of-the-art approximate integer multipliers/dividers. Further, the thesis also presents design of approximate floating-point multipliers and dividers. These designs utilize the optimized versions of the proposed MBM and REALM multipliers for mantissa multiplications and the proposed INZeD divider for mantissa division, and offer better design trade-offs than traditional precision scaling.
The existing approximate integer dividers as well as the proposed INZeD suffer from unreasonably high worst-case error. This thesis presents WEID, which is a novel light-weight method for reducing worst-case error in approximate dividers. Finally, the thesis presents a methodology for selection of approximate arithmetic units for a given application. The methodology is based on a novel selection algorithm and utilizes the subrange error characterization of approximate arithmetic units, which performs error characterization independently in different segments of the input range