116 research outputs found

    FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

    Full text link
    Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201

    Gestión de jerarquías de memoria híbridas a nivel de sistema

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadoras y Automática y de Ku Leuven, Arenberg Doctoral School, Faculty of Engineering Science, leída el 11/05/2017.In electronics and computer science, the term ‘memory’ generally refers to devices that are used to store information that we use in various appliances ranging from our PCs to all hand-held devices, smart appliances etc. Primary/main memory is used for storage systems that function at a high speed (i.e. RAM). The primary memory is often associated with addressable semiconductor memory, i.e. integrated circuits consisting of silicon-based transistors, used for example as primary memory but also other purposes in computers and other digital electronic devices. The secondary/auxiliary memory, in comparison provides program and data storage that is slower to access but offers larger capacity. Examples include external hard drives, portable flash drives, CDs, and DVDs. These devices and media must be either plugged in or inserted into a computer in order to be accessed by the system. Since secondary storage technology is not always connected to the computer, it is commonly used for backing up data. The term storage is often used to describe secondary memory. Secondary memory stores a large amount of data at lesser cost per byte than primary memory; this makes secondary storage about two orders of magnitude less expensive than primary storage. There are two main types of semiconductor memory: volatile and nonvolatile. Examples of non-volatile memory are ‘Flash’ memory (sometimes used as secondary, sometimes primary computer memory) and ROM/PROM/EPROM/EEPROM memory (used for firmware such as boot programs). Examples of volatile memory are primary memory (typically dynamic RAM, DRAM), and fast CPU cache memory (typically static RAM, SRAM, which is fast but energy-consuming and offer lower memory capacity per are a unit than DRAM). Non-volatile memory technologies in Si-based electronics date back to the 1990s. Flash memory is widely used in consumer electronic products such as cellphones and music players and NAND Flash-based solid-state disks (SSDs) are increasingly displacing hard disk drives as the primary storage device in laptops, desktops, and even data centers. The integration limit of Flash memories is approaching, and many new types of memory to replace conventional Flash memories have been proposed. The rapid increase of leakage currents in Silicon CMOS transistors with scaling poses a big challenge for the integration of SRAM memories. There is also the case of susceptibility to read/write failure with low power schemes. As a result of this, over the past decade, there has been an extensive pooling of time, resources and effort towards developing emerging memory technologies like Resistive RAM (ReRAM/RRAM), STT-MRAM, Domain Wall Memory and Phase Change Memory(PRAM). Emerging non-volatile memory technologies promise new memories to store more data at less cost than the expensive-to build silicon chips used by popular consumer gadgets including digital cameras, cell phones and portable music players. These new memory technologies combine the speed of static random-access memory (SRAM), the density of dynamic random-access memory (DRAM), and the non-volatility of Flash memory and so become very attractive as another possibility for future memory hierarchies. The research and information on these Non-Volatile Memory (NVM) technologies has matured over the last decade. These NVMs are now being explored thoroughly nowadays as viable replacements for conventional SRAM based memories even for the higher levels of the memory hierarchy. Many other new classes of emerging memory technologies such as transparent and plastic, three-dimensional(3-D), and quantum dot memory technologies have also gained tremendous popularity in recent years...En el campo de la informática, el término ‘memoria’ se refiere generalmente a dispositivos que son usados para almacenar información que posteriormente será usada en diversos dispositivos, desde computadoras personales (PC), móviles, dispositivos inteligentes, etc. La memoria principal del sistema se utiliza para almacenar los datos e instrucciones de los procesos que se encuentre en ejecución, por lo que se requiere que funcionen a alta velocidad (por ejemplo, DRAM). La memoria principal está implementada habitualmente mediante memorias semiconductoras direccionables, siendo DRAM y SRAM los principales exponentes. Por otro lado, la memoria auxiliar o secundaria proporciona almacenaje(para ficheros, por ejemplo); es más lenta pero ofrece una mayor capacidad. Ejemplos típicos de memoria secundaria son discos duros, memorias flash portables, CDs y DVDs. Debido a que estos dispositivos no necesitan estar conectados a la computadora de forma permanente, son muy utilizados para almacenar copias de seguridad. La memoria secundaria almacena una gran cantidad de datos aun coste menor por bit que la memoria principal, siendo habitualmente dos órdenes de magnitud más barata que la memoria primaria. Existen dos tipos de memorias de tipo semiconductor: volátiles y no volátiles. Ejemplos de memorias no volátiles son las memorias Flash (algunas veces usadas como memoria secundaria y otras veces como memoria principal) y memorias ROM/PROM/EPROM/EEPROM (usadas para firmware como programas de arranque). Ejemplos de memoria volátil son las memorias DRAM (RAM dinámica), actualmente la opción predominante a la hora de implementar la memoria principal, y las memorias SRAM (RAM estática) más rápida y costosa, utilizada para los diferentes niveles de cache. Las tecnologías de memorias no volátiles basadas en electrónica de silicio se remontan a la década de1990. Una variante de memoria de almacenaje por carga denominada como memoria Flash es mundialmente usada en productos electrónicos de consumo como telefonía móvil y reproductores de música mientras NAND Flash solid state disks(SSDs) están progresivamente desplazando a los dispositivos de disco duro como principal unidad de almacenamiento en computadoras portátiles, de escritorio e incluso en centros de datos. En la actualidad, hay varios factores que amenazan la actual predominancia de memorias semiconductoras basadas en cargas (capacitivas). Por un lado, se está alcanzando el límite de integración de las memorias Flash, lo que compromete su escalado en el medio plazo. Por otra parte, el fuerte incremento de las corrientes de fuga de los transistores de silicio CMOS actuales, supone un enorme desafío para la integración de memorias SRAM. Asimismo, estas memorias son cada vez más susceptibles a fallos de lectura/escritura en diseños de bajo consumo. Como resultado de estos problemas, que se agravan con cada nueva generación tecnológica, en los últimos años se han intensificado los esfuerzos para desarrollar nuevas tecnologías que reemplacen o al menos complementen a las actuales. Los transistores de efecto campo eléctrico ferroso (FeFET en sus siglas en inglés) se consideran una de las alternativas más prometedores para sustituir tanto a Flash (por su mayor densidad) como a DRAM (por su mayor velocidad), pero aún está en una fase muy inicial de su desarrollo. Hay otras tecnologías algo más maduras, en el ámbito de las memorias RAM resistivas, entre las que cabe destacar ReRAM (o RRAM), STT-RAM, Domain Wall Memory y Phase Change Memory (PRAM)...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Emerging Run-Time Reconfigurable FPGA and CAD Tools

    Get PDF
    Field-programmable gate array (FPGA) is a post fabrication reconfigurable device to accelerate domain specific computing systems. It offers offer high operation speed and low power consumption. However, the design flexibility and performance of FPGAs are severely constrained by the costly on-chip memories, e.g. static random access memory (SRAM) and FLASH memory. The objective of my dissertation is to explore the opportunity and enable the use of the emerging resistance random access memory (ReRAM) in FPGA design. The emerging ReRAM technology features high storage density, low access power consumption, and CMOS compatibility, making it a promising candidate for FPGA implementation. In particular, ReRAM has advantages of the fast access and nonvolatility, enabling the on-chip storage and access of configuration data. In this dissertation, I first propose a novel three-dimensional stacking scheme, namely, high-density interleaved memory (HIM). The structure improves the density of ReRAM meanwhile effectively reducing the signal interference induced by sneak paths in crossbar arrays. To further enhance the access speed and design reliability, a fast sensing circuit is also presented which includes a new sense amplifier scheme and reference cell configuration. The proposed ReRAM FPGA leverages a similar architecture as conventional SRAM based FPGAs but utilizes ReRAM technology in all component designs. First, HIM is used to implement look-up table (LUT) and block random access memories (BRAMs) for func- tionality process. Second, a 2R1T, two ReRAM cells and one transistor, nonvolatile switch design is applied to construct connection blocks (CBs) and switch blocks (SBs) for signal transition. Furthermore, unified BRAM (uBRAM) based on the current BRAM architecture iv is introduced, offering both configuration and temporary data storage. The uBRAMs provides extremely high density effectively and enlarges the FPGA capacity, potentially saving multiple contexts of configuration. The fast configuration scheme from uBRAM to logic and routing components also makes fast run-time partial reconfiguration (PR) much easier, improving the flexibility and performance of the entire FPGA system. Finally, modern place and route tools are designed for homogeneous fabric of FPGA. The PR feature, however, requires the support of heterogeneous logic modules in order to differentiate PR modules from static ones and therefore maintain the signal integration. The existing approaches still reply on designers’ manual effort, which significantly prolongs design time and lowers design efficiency. In this dissertation, I integrate PR support into VPR – an academic place and route tool by introducing a B*-tree modular placer (BMP) and PR-aware router. As such, users are able to explore new architectures or map PR applications to a variety of FPGAs. More importantly, this enhanced feature can also support fast design automation, e.g. mapping IP core, loading pre-synthesizing logic modules, etc

    Nouvelles Architectures Hybrides (Logique / Mémoires Non-Volatiles et technologies associées.)

    Get PDF
    Les nouvelles approches de technologies mémoires permettront une intégration dite back-end, où les cellules élémentaires de stockage seront fabriquées lors des dernières étapes de réalisation à grande échelle du circuit. Ces approches innovantes sont souvent basées sur l'utilisation de matériaux actifs présentant deux états de résistance distincts. Le passage d'un état à l'autre est contrôlé en courant ou en tension donnant lieu à une caractéristique I-V hystérétique. Nos mémoires résistives sont composées d'argent en métal électrochimiquement actif et de sulfure amorphe agissant comme électrolyte. Leur fonctionnement repose sur la formation réversible et la dissolution d'un filament conducteur. Le potentiel d'application de ces nouveaux dispositifs n'est pas limité aux mémoires ultra-haute densité mais aussi aux circuits embarqués. En empilant ces mémoires dans la troisième dimension au niveau des interconnections des circuits logiques CMOS, de nouvelles architectures hybrides et innovantes deviennent possibles. Il serait alors envisageable d'exploiter un fonctionnement à basse énergie, à haute vitesse d'écriture/lecture et de haute performance telles que l'endurance et la rétention. Dans cette thèse, en se concentrant sur les aspects de la technologie de mémoire en vue de développer de nouvelles architectures, l'introduction d'une fonctionnalité non-volatile au niveau logique est démontrée par trois circuits hybrides: commutateurs de routage non volatiles dans un Field Programmable Gate Arrays, un 6T-SRAM non volatile, et les neurones stochastiques pour un réseau neuronal. Pour améliorer les solutions existantes, les limitations de la performances des dispositifs mémoires sont identifiés et résolus avec des nouveaux empilements ou en fournissant des défauts de circuits tolérants.Novel approaches in the field of memory technology should enable backend integration, where individual storage nodes will be fabricated during the last fabrication steps of the VLSI circuit. In this case, memory operation is often based upon the use of active materials with resistive switching properties. A topology of resistive memory consists of silver as electrochemically active metal and amorphous sulfide acting as electrolyte and relies on the reversible formation and dissolution of a conductive filament. The application potential of these new memories is not limited to stand-alone (ultra-high density), but is also suitable for embedded applications. By stacking these memories in the third dimension at the interconnection level of CMOS logic, new ultra-scalable hybrid architectures becomes possible which exploit low energy operation, fast write/read access and high performance with respect to endurance and retention. In this thesis, focusing on memory technology aspects in view of developing new architectures, the introduction of non-volatile functionality at the logic level is demonstrated through three hybrid (CMOS logic ReRAM devices) circuits: nonvolatile routing switches in a Field Programmable Gate Array, nonvolatile 6T-SRAMs, and stochastic neurons of an hardware neural network. To be competitive or even improve existing solutions, limitations on the memory devices performances are identified and solved by stack engineering of CBRAM devices or providing faults tolerant circuits.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    GMS: Generic Memristive Structure for Non-Volatile FPGAs

    Get PDF
    The invention of the memristor enables new possibilities for computation and non-volatile memory storage. In this paper we propose a Generic Memristive Structure (GMS) for 3-D FPGA applications. The GMS cell is demonstrated to be utilized for steering logic useful for multiplexing signals, thus replacing the traditional pass-gates in FPGAs. Moreover, the same GMS cell can be utilized for programmable memories as a replacement for the SRAMs employed in the look-up tables of FPGAs. A fabricated GMS cell is presented and its use in FPGA architecture is demonstrated by the area and delay improvement for several architectural benchmarks

    In-memory computing with emerging memory devices: Status and outlook

    Get PDF
    Supporting data for "In-memory computing with emerging memory devices: status and outlook", submitted to APL Machine Learning
    • …
    corecore