267 research outputs found
Recommended from our members
Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization.
The key operation in stochastic neural networks, which have become the state-of-the-art approach for solving problems in machine learning, information theory, and statistics, is a stochastic dot-product. While there have been many demonstrations of dot-product circuits and, separately, of stochastic neurons, the efficient hardware implementation combining both functionalities is still missing. Here we report compact, fast, energy-efficient, and scalable stochastic dot-product circuits based on either passively integrated metal-oxide memristors or embedded floating-gate memories. The circuit's high performance is due to mixed-signal implementation, while the efficient stochastic operation is achieved by utilizing circuit's noise, intrinsic and/or extrinsic to the memory cell array. The dynamic scaling of weights, enabled by analog memory devices, allows for efficient realization of different annealing approaches to improve functionality. The proposed approach is experimentally verified for two representative applications, namely by implementing neural network for solving a four-node graph-partitioning problem, and a Boltzmann machine with 10-input and 8-hidden neurons
An Analog VLSI Deep Machine Learning Implementation
Machine learning systems provide automated data processing and see a wide range of applications. Direct processing of raw high-dimensional data such as images and video by machine learning systems is impractical both due to prohibitive power consumption and the âcurse of dimensionality,â which makes learning tasks exponentially more difficult as dimension increases. Deep machine learning (DML) mimics the hierarchical presentation of information in the human brain to achieve robust automated feature extraction, reducing the dimension of such data. However, the computational complexity of DML systems limits large-scale implementations in standard digital computers. Custom analog signal processing (ASP) can yield much higher energy efficiency than digital signal processing (DSP), presenting means of overcoming these limitations.
The purpose of this work is to develop an analog implementation of DML system.
First, an analog memory is proposed as an essential component of the learning systems. It uses the charge trapped on the floating gate to store analog value in a non-volatile way. The memory is compatible with standard digital CMOS process and allows random-accessible bi-directional updates without the need for on-chip charge pump or high voltage switch.
Second, architecture and circuits are developed to realize an online k-means clustering algorithm in analog signal processing. It achieves automatic recognition of underlying data pattern and online extraction of data statistical parameters. This unsupervised learning system constitutes the computation node in the deep machine learning hierarchy.
Third, a 3-layer, 7-node analog deep machine learning engine is designed featuring online unsupervised trainability and non-volatile floating-gate analog storage. It utilizes massively parallel reconfigurable current-mode analog architecture to realize efficient computation. And algorithm-level feedback is leveraged to provide robustness to circuit imperfections in analog signal processing. At a processing speed of 8300 input vectors per second, it achieves 1Ă1012 operation per second per Watt of peak energy efficiency.
In addition, an ultra-low-power tunable bump circuit is presented to provide similarity measures in analog signal processing. It incorporates a novel wide-input-range tunable pseudo-differential transconductor. The circuit demonstrates tunability of bump center, width and height with a power consumption significantly lower than previous works
Redesigning Commercial Floating-Gate Memory for Analog Computing Applications
We have modified a commercial NOR flash memory array to enable high-precision
tuning of individual floating-gate cells for analog computing applications. The
modified array area per cell in a 180 nm process is about 1.5 um^2. While this
area is approximately twice the original cell size, it is still at least an
order of magnitude smaller than in the state-of-the-art analog circuit
implementations. The new memory cell arrays have been successfully tested, in
particular confirming that each cell may be automatically tuned, with ~1%
precision, to any desired subthreshold readout current value within an almost
three-orders-of-magnitude dynamic range, even using an unoptimized tuning
algorithm. Preliminary results for a four-quadrant vector-by-matrix multiplier,
implemented with the modified memory array gate-coupled with additional
peripheral floating-gate transistors, show highly linear transfer
characteristics over a broad range of input currents.Comment: 4 pages, 6 figure
Design of Building Blocks for Trit Algorithm
This thesis attempts to design the building blocks for TRIT algorithm. PSPICE was used for simulation. The building blocks were laidout in Magic.Electrical Engineerin
- âŠ