4,366 research outputs found

    MorphIC: A 65-nm 738k-Synapse/mm2^2 Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning

    Full text link
    Recent trends in the field of neural network accelerators investigate weight quantization as a means to increase the resource- and power-efficiency of hardware devices. As full on-chip weight storage is necessary to avoid the high energy cost of off-chip memory accesses, memory reduction requirements for weight storage pushed toward the use of binary weights, which were demonstrated to have a limited accuracy reduction on many applications when quantization-aware training techniques are used. In parallel, spiking neural network (SNN) architectures are explored to further reduce power when processing sparse event-based data streams, while on-chip spike-based online learning appears as a key feature for applications constrained in power and resources during the training phase. However, designing power- and area-efficient spiking neural networks still requires the development of specific techniques in order to leverage on-chip online learning on binary weights without compromising the synapse density. In this work, we demonstrate MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning rule and a hierarchical routing fabric for large-scale chip interconnection. The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF) neurons and more than two million plastic synapses for an active silicon area of 2.86mm2^2 in 65nm CMOS, achieving a high density of 738k synapses/mm2^2. MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff on the MNIST classification task compared to previously-proposed SNNs, while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE Transactions on Biomedical Circuits and Systems journal (2019), the fully-edited paper is available at https://ieeexplore.ieee.org/document/876400

    Design of multimedia processor based on metric computation

    Get PDF
    Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

    Cryptarray A Scalable And Reconfigurable Architecture For Cryptographic Applications

    Get PDF
    Cryptography is increasingly viewed as a critical technology to fulfill the requirements of security and authentication for information exchange between Internet applications. However, software implementations of cryptographic applications are unable to support the quality of service from a bandwidth perspective required by most Internet applications. As a result, various hardware implementations, from Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), to programmable processors, were proposed to improve this inadequate quality of service. Although these implementations provide performances that are considered better than those produced by software implementations, they still fall short of addressing the bandwidth requirements of most cryptographic applications in the context of the Internet for two major reasons: (i) The majority of these architectures sacrifice flexibility for performance in order to reach the performance level needed for cryptographic applications. This lack of flexibility can be detrimental considering that cryptographic standards and algorithms are still evolving. (ii) These architectures do not consider the consequences of technology scaling in general, and particularly interconnect related problems. As a result, this thesis proposes an architecture that attempts to address the requirements of cryptographic applications by overcoming the obstacles described in (i) and (ii). To this end, we propose a new reconfigurable, two-dimensional, scalable architecture, called CRYPTARRAY, in which bus-based communication is replaced by distributed shared memory communication. At the physical level, the length of the wires will be kept to a minimum. CRYPTARRAY is organized as a chessboard in which the dark and light squares represent Processing Elements (PE) and memory blocks respectively. The granularity and resource composition of the PEs is specifically designed to support the computing operations encountered in cryptographic algorithms in general, and symmetric algorithms in particular. Communication can occur only between neighboring PEs through locally shared memory blocks. Because of the chessboard layout, the architecture can be reconfigured to allow computation to proceed as a pipelined wave in any direction. This organization offers a high computational density in terms of datapath resources and a large number of distributed storage resources that easily support a high degree of parallelism and pipelining. Experimental prototyping a small array on FPGA chips shows that this architecture can run at 80.9 MHz producing 26,968,716 outputs every second in static reconfiguration mode and 20,226,537 outputs every second in dynamic reconfiguration mode

    ACE16K: The Third Generation of Mixed-Signal SIMD-CNN ACE Chips Toward VSoCs

    Get PDF
    Today, with 0.18-μm technologies mature and stable enough for mixed-signal design with a large variety of CMOS compatible optical sensors available and with 0.09-μm technologies knocking at the door of designers, we can face the design of integrated systems, instead of just integrated circuits. In fact, significant progress has been made in the last few years toward the realization of vision systems on chips (VSoCs). Such VSoCs are eventually targeted to integrate within a semiconductor substrate the functions of optical sensing, image processing in space and time, high-level processing, and the control of actuators. The consecutive generations of ACE chips define a roadmap toward flexible VSoCs. These chips consist of arrays of mixed-signal processing elements (PEs) which operate in accordance with single instruction multiple data (SIMD) computing architectures and exhibit the functional features of CNN Universal Machines. They have been conceived to cover the early stages of the visual processing path in a fully-parallel manner, and hence more efficiently than DSP-based systems. Across the different generations, different improvements and modifications have been made looking to converge with the newest discoveries of neurobiologists regarding the behavior of natural retinas. This paper presents considerations pertaining to the design of a member of the third generation of ACE chips, namely to the so-called ACE16k chip. This chip, designed in a 0.35-μm standard CMOS technology, contains about 3.75 million transistors and exhibits peak computing figures of 330 GOPS, 3.6 GOPS/mm2 and 82.5 GOPS/W. Each PE in the array contains a reconfigurable computing kernel capable of calculating linear convolutions on 3×3 neighborhoods in less than 1.5 μs, imagewise Boolean combinations in less than 200 ns, imagewise arithmetic operations in about 5 μs, and CNN-like temporal evolutions with a time constant of about 0.5 μs. Unfortunately, the many ideas underlying the design of this chip cannot be covered in a single paper; hence, this paper is focused on, first, placing the ACE16k in the ACE chip roadmap and, then, discussing the most significant modifications of ACE16K versus its predecessors in the family.LOCUST IST2001—38 097VISTA TIC2003—09 817 - C02—01Office of Naval Research N000 140 210 88

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations

    Memory and information processing in neuromorphic systems

    Full text link
    A striking difference between brain-inspired neuromorphic processors and current von Neumann processors architectures is the way in which memory and processing is organized. As Information and Communication Technologies continue to address the need for increased computational power through the increase of cores within a digital processor, neuromorphic engineers and scientists can complement this need by building processor architectures where memory is distributed with the processing. In this paper we present a survey of brain-inspired processor architectures that support models of cortical networks and deep neural networks. These architectures range from serial clocked implementations of multi-neuron systems to massively parallel asynchronous ones and from purely digital systems to mixed analog/digital systems which implement more biological-like models of neurons and synapses together with a suite of adaptation and learning mechanisms analogous to the ones found in biological nervous systems. We describe the advantages of the different approaches being pursued and present the challenges that need to be addressed for building artificial neural processing systems that can display the richness of behaviors seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed neuromorphic computing platforms and system
    corecore