114 research outputs found

    Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

    Full text link
    Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be deployed on such hardware to achieve equivalent accuracy to digital computing. In this tutorial, we provide a deep dive into how such adaptations can be achieved and evaluated using the recently released IBM Analog Hardware Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit. The AIHWKit is a Python library that simulates inference and training of DNNs using AIMC. We present an in-depth description of the AIHWKit design, functionality, and best practices to properly perform inference and training. We also present an overview of the Analog AI Cloud Composer, that provides the benefits of using the AIHWKit simulation platform in a fully managed cloud setting. Finally, we show examples on how users can expand and customize AIHWKit for their own needs. This tutorial is accompanied by comprehensive Jupyter Notebook code examples that can be run using AIHWKit, which can be downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial

    Biologically Plausible Learning on Neuromorphic Hardware Architectures

    Full text link
    With an ever-growing number of parameters defining increasingly complex networks, Deep Learning has led to several breakthroughs surpassing human performance. As a result, data movement for these millions of model parameters causes a growing imbalance known as the memory wall. Neuromorphic computing is an emerging paradigm that confronts this imbalance by performing computations directly in analog memories. On the software side, the sequential Backpropagation algorithm prevents efficient parallelization and thus fast convergence. A novel method, Direct Feedback Alignment, resolves inherent layer dependencies by directly passing the error from the output to each layer. At the intersection of hardware/software co-design, there is a demand for developing algorithms that are tolerable to hardware nonidealities. Therefore, this work explores the interrelationship of implementing bio-plausible learning in-situ on neuromorphic hardware, emphasizing energy, area, and latency constraints. Using the benchmarking framework DNN+NeuroSim, we investigate the impact of hardware nonidealities and quantization on algorithm performance, as well as how network topologies and algorithm-level design choices can scale latency, energy and area consumption of a chip. To the best of our knowledge, this work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa. The best results achieved for accuracy remain Backpropagation-based, notably when facing hardware imperfections. Direct Feedback Alignment, on the other hand, allows for significant speedup due to parallelization, reducing training time by a factor approaching N for N-layered networks

    Design of Novel Analog Compute Paradigms with Ark

    Full text link
    Previous efforts on reconfigurable analog circuits mostly focused on specialized analog circuits, produced through careful co-design, or on highly reconfigurable, but relatively resource inefficient, accelerators that implement analog compute paradigms. This work deals with an intermediate point in the design space: Specialized reconfigurable circuits for analog compute paradigms. This class of circuits requires new methodologies for performing co-design, as prior techniques are typically highly specialized to conventional circuit classes (e.g., filters, ADCs). In this context, we present Ark, a programming language for describing analog compute paradigms. Ark enables progressive incorporation of analog behaviors into computations, and deploys a validator and dynamical system compiler for verifying and simulating computations. We use Ark to codify the design space for three different exemplary circuit design problems, and demonstrate that Ark helps exploring design trade-offs and evaluating the impact of nonidealities to the computation

    Analogue neuromorphic systems.

    Get PDF
    This thesis addresses a new area of science and technology, that of neuromorphic systems, namely the problems and prospects of analogue neuromorphic systems. The subject is subdivided into three chapters. Chapter 1 is an introduction. It formulates the oncoming problem of the creation of highly computationally costly systems of nonlinear information processing (such as artificial neural networks and artificial intelligence systems). It shows that an analogue technology could make a vital contribution to the creation such systems. The basic principles of creation of analogue neuromorphic systems are formulated. The importance will be emphasised of the principle of orthogonality for future highly efficient complex information processing systems. Chapter 2 reviews the basics of neural and neuromorphic systems and informs on the present situation in this field of research, including both experimental and theoretical knowledge gained up-to-date. The chapter provides the necessary background for correct interpretation of the results reported in Chapter 3 and for a realistic decision on the direction for future work. Chapter 3 describes my own experimental and computational results within the framework of the subject, obtained at De Montfort University. These include: the building of (i) Analogue Polynomial Approximator/lnterpolatoriExtrapolator, (ii) Synthesiser of orthogonal functions, (iii) analogue real-time video filter (performing the homomorphic filtration), (iv) Adaptive polynomial compensator of geometrical distortions of CRT- monitors, (v) analogue parallel-learning neural network (backpropagation algorithm). Thus, this thesis makes a dual contribution to the chosen field: it summarises the present knowledge on the possibility of utilising analogue technology in up-to-date and future computational systems, and it reports new results within the framework of the subject. The main conclusion is that due to its promising power characteristics, small sizes and high tolerance to degradation, the analogue neuromorphic systems will playa more and more important role in future computational systems (in particular in systems of artificial intelligence)

    Memristive crossbars as hardware accelerators: modelling, design and new uses

    Get PDF
    Digital electronics has given rise to reliable, affordable, and scalable computing devices. However, new computing paradigms present challenges. For example, machine learning requires repeatedly processing large amounts of data; this creates a bottleneck in conventional computers, where computing and memory are separated. To add to that, Moore’s “law” is plateauing and is thus unlikely to address the increasing demand for computational power. In-memory computing, and specifically hardware accelerators for linear algebra, may address both of these issues. Memristive crossbar arrays are a promising candidate for such hardware accelerators. Memristive devices are fast, energy-efficient, and—when arranged in a crossbar structure—can compute vector-matrix products. Unfortunately, they come with their own set of limitations. The analogue nature of these devices makes them stochastic and thus less reliable compared to digital devices. It does not, however, necessarily make them unsuitable for computing. Nevertheless, successful deployment of analogue hardware accelerators requires a proper understanding of their drawbacks, ways of mitigating the effects of undesired physical behaviour, and applications where some degree of stochasticity is tolerable. In this thesis, I investigate the effects of nonidealities in memristive crossbar arrays, introduce techniques of minimising those negative effects, and present novel crossbar circuit designs for new applications. I mostly focus on physical implementations of neural networks and investigate the influence of device nonidealities on classification accuracy. To make memristive neural networks more reliable, I explore committee machines, rearrangement of crossbar lines, nonideality-aware training, and other techniques. I find that they all may contribute to the higher accuracy of physically implemented neural networks, often comparable to the accuracy of their digital counterparts. Finally, I introduce circuits that extend dot product computations to higher-rank arrays, different linear algebra operations, and quaternion vectors and matrices. These present opportunities for using crossbar arrays in new ways, including the processing of coloured images

    Device and Circuit Architectures for In‐Memory Computing

    Get PDF
    With the rise in artificial intelligence (AI), computing systems are facing new challenges related to the large amount of data and the increasing burden of communication between the memory and the processing unit. In‐memory computing (IMC) appears as a promising approach to suppress the memory bottleneck and enable higher parallelism of data processing, thanks to the memory array architecture. As a result, IMC shows a better throughput and lower energy consumption with respect to the conventional digital approach, not only for typical AI tasks, but also for general‐purpose problems such as constraint satisfaction problems (CSPs) and linear algebra. Herein, an overview of IMC is provided in terms of memory devices and circuit architectures. First, the memory device technologies adopted for IMC are summarized, focusing on both charge‐based memories and emerging devices relying on electrically induced material modification at the chemical or physical level. Then, the computational memory programming and the corresponding device nonidealities are described with reference to offline and online training of IMC circuits. Finally, array architectures for computing are reviewed, including typical architectures for neural network accelerators, content addressable memory (CAM), and novel circuit topologies for general‐purpose computing with low complexity
    • 

    corecore