1,649 research outputs found

    An Analog VLSI Deep Machine Learning Implementation

    Get PDF
    Machine learning systems provide automated data processing and see a wide range of applications. Direct processing of raw high-dimensional data such as images and video by machine learning systems is impractical both due to prohibitive power consumption and the “curse of dimensionality,” which makes learning tasks exponentially more difficult as dimension increases. Deep machine learning (DML) mimics the hierarchical presentation of information in the human brain to achieve robust automated feature extraction, reducing the dimension of such data. However, the computational complexity of DML systems limits large-scale implementations in standard digital computers. Custom analog signal processing (ASP) can yield much higher energy efficiency than digital signal processing (DSP), presenting means of overcoming these limitations. The purpose of this work is to develop an analog implementation of DML system. First, an analog memory is proposed as an essential component of the learning systems. It uses the charge trapped on the floating gate to store analog value in a non-volatile way. The memory is compatible with standard digital CMOS process and allows random-accessible bi-directional updates without the need for on-chip charge pump or high voltage switch. Second, architecture and circuits are developed to realize an online k-means clustering algorithm in analog signal processing. It achieves automatic recognition of underlying data pattern and online extraction of data statistical parameters. This unsupervised learning system constitutes the computation node in the deep machine learning hierarchy. Third, a 3-layer, 7-node analog deep machine learning engine is designed featuring online unsupervised trainability and non-volatile floating-gate analog storage. It utilizes massively parallel reconfigurable current-mode analog architecture to realize efficient computation. And algorithm-level feedback is leveraged to provide robustness to circuit imperfections in analog signal processing. At a processing speed of 8300 input vectors per second, it achieves 1×1012 operation per second per Watt of peak energy efficiency. In addition, an ultra-low-power tunable bump circuit is presented to provide similarity measures in analog signal processing. It incorporates a novel wide-input-range tunable pseudo-differential transconductor. The circuit demonstrates tunability of bump center, width and height with a power consumption significantly lower than previous works

    A Circuit-Based Neural Network with Hybrid Learning of Backpropagation and Random Weight Change Algorithms.

    Get PDF
    A hybrid learning method of a software-based backpropagation learning and a hardware-based RWC learning is proposed for the development of circuit-based neural networks. The backpropagation is known as one of the most efficient learning algorithms. A weak point is that its hardware implementation is extremely difficult. The RWC algorithm, which is very easy to implement with respect to its hardware circuits, takes too many iterations for learning. The proposed learning algorithm is a hybrid one of these two. The main learning is performed with a software version of the BP algorithm, firstly, and then, learned weights are transplanted on a hardware version of a neural circuit. At the time of the weight transplantation, a significant amount of output error would occur due to the characteristic difference between the software and the hardware. In the proposed method, such error is reduced via a complementary learning of the RWC algorithm, which is implemented in a simple hardware. The usefulness of the proposed hybrid learning system is verified via simulations upon several classical learning problems
    corecore