16 research outputs found

    DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator

    Full text link
    The wide adoption and significant computing resource consumption of attention-based Transformers, e.g., Vision Transformer and large language models, have driven the demands for efficient hardware accelerators. While electronic accelerators have been commonly used, there is a growing interest in exploring photonics as an alternative technology due to its high energy efficiency and ultra-fast processing speed. Optical neural networks (ONNs) have demonstrated promising results for convolutional neural network (CNN) workloads that only require weight-static linear operations. However, they fail to efficiently support Transformer architectures with attention operations due to the lack of ability to process dynamic full-range tensor multiplication. In this work, we propose a customized high-performance and energy-efficient photonic Transformer accelerator, DOTA. To overcome the fundamental limitation of existing ONNs, we introduce a novel photonic tensor core, consisting of a crossbar array of interference-based optical vector dot-product engines, that supports highly-parallel, dynamic, and full-range matrix-matrix multiplication. Our comprehensive evaluation demonstrates that DOTA achieves a >4x energy and a >10x latency reduction compared to prior photonic accelerators, and delivers over 20x energy reduction and 2 to 3 orders of magnitude lower latency compared to the electronic Transformer accelerator. Our work highlights the immense potential of photonic computing for efficient hardware accelerators, particularly for advanced machine learning workloads.Comment: The short version is accepted by Next-Gen AI System Workshop at MLSys 202

    High-Performance and Low-Power Magnetic Material Memory Based Cache Design

    Get PDF
    Magnetic memory technologies are very promising candidates to be universal memory due to its good scalability, zero standby power and radiation hardness. Having a cell area much smaller than SRAM, magnetic memory can be used to construct much larger cache with the same die footprint, leading to siginficant improvement of overall system performance and power consumption especially in this multi-core era. However, magnetic memories have their own drawbacks such as slow write, read disturbance and scaling limitation, making its usage as caches challenging. This dissertation comprehensively studied these two most popular magnetic memory technologies. Design exploration and optimization for the cache design from different design layers including the memory devices, peripheral circuit, memory array structure and micro-architecture are presented. By leveraging device features, two major micro-architectures -multi-retention cache hierarchy and process-variation-aware cache are presented to improve the write performance of STT-RAM. The enhancement in write performance results in the degradation of read operations, in terms of both speed and data reliability. This dissertation also presents an architecture to resolve STT-RAM read disturbance issue. Furthermore, the scaling of STT-RAM is hindered due to the required size of switching transistor. To break the cell area limitation of STT-RAM, racetrack memory is studied to achieve an even higher memory density and better performance and lower energy consumption. With dedicated elaboration, racetrack memory based cache design can achieve a siginificant area reduction and energy saving when compared to optimized STT-RAM

    Gestión de jerarquías de memoria híbridas a nivel de sistema

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadoras y Automática y de Ku Leuven, Arenberg Doctoral School, Faculty of Engineering Science, leída el 11/05/2017.In electronics and computer science, the term ‘memory’ generally refers to devices that are used to store information that we use in various appliances ranging from our PCs to all hand-held devices, smart appliances etc. Primary/main memory is used for storage systems that function at a high speed (i.e. RAM). The primary memory is often associated with addressable semiconductor memory, i.e. integrated circuits consisting of silicon-based transistors, used for example as primary memory but also other purposes in computers and other digital electronic devices. The secondary/auxiliary memory, in comparison provides program and data storage that is slower to access but offers larger capacity. Examples include external hard drives, portable flash drives, CDs, and DVDs. These devices and media must be either plugged in or inserted into a computer in order to be accessed by the system. Since secondary storage technology is not always connected to the computer, it is commonly used for backing up data. The term storage is often used to describe secondary memory. Secondary memory stores a large amount of data at lesser cost per byte than primary memory; this makes secondary storage about two orders of magnitude less expensive than primary storage. There are two main types of semiconductor memory: volatile and nonvolatile. Examples of non-volatile memory are ‘Flash’ memory (sometimes used as secondary, sometimes primary computer memory) and ROM/PROM/EPROM/EEPROM memory (used for firmware such as boot programs). Examples of volatile memory are primary memory (typically dynamic RAM, DRAM), and fast CPU cache memory (typically static RAM, SRAM, which is fast but energy-consuming and offer lower memory capacity per are a unit than DRAM). Non-volatile memory technologies in Si-based electronics date back to the 1990s. Flash memory is widely used in consumer electronic products such as cellphones and music players and NAND Flash-based solid-state disks (SSDs) are increasingly displacing hard disk drives as the primary storage device in laptops, desktops, and even data centers. The integration limit of Flash memories is approaching, and many new types of memory to replace conventional Flash memories have been proposed. The rapid increase of leakage currents in Silicon CMOS transistors with scaling poses a big challenge for the integration of SRAM memories. There is also the case of susceptibility to read/write failure with low power schemes. As a result of this, over the past decade, there has been an extensive pooling of time, resources and effort towards developing emerging memory technologies like Resistive RAM (ReRAM/RRAM), STT-MRAM, Domain Wall Memory and Phase Change Memory(PRAM). Emerging non-volatile memory technologies promise new memories to store more data at less cost than the expensive-to build silicon chips used by popular consumer gadgets including digital cameras, cell phones and portable music players. These new memory technologies combine the speed of static random-access memory (SRAM), the density of dynamic random-access memory (DRAM), and the non-volatility of Flash memory and so become very attractive as another possibility for future memory hierarchies. The research and information on these Non-Volatile Memory (NVM) technologies has matured over the last decade. These NVMs are now being explored thoroughly nowadays as viable replacements for conventional SRAM based memories even for the higher levels of the memory hierarchy. Many other new classes of emerging memory technologies such as transparent and plastic, three-dimensional(3-D), and quantum dot memory technologies have also gained tremendous popularity in recent years...En el campo de la informática, el término ‘memoria’ se refiere generalmente a dispositivos que son usados para almacenar información que posteriormente será usada en diversos dispositivos, desde computadoras personales (PC), móviles, dispositivos inteligentes, etc. La memoria principal del sistema se utiliza para almacenar los datos e instrucciones de los procesos que se encuentre en ejecución, por lo que se requiere que funcionen a alta velocidad (por ejemplo, DRAM). La memoria principal está implementada habitualmente mediante memorias semiconductoras direccionables, siendo DRAM y SRAM los principales exponentes. Por otro lado, la memoria auxiliar o secundaria proporciona almacenaje(para ficheros, por ejemplo); es más lenta pero ofrece una mayor capacidad. Ejemplos típicos de memoria secundaria son discos duros, memorias flash portables, CDs y DVDs. Debido a que estos dispositivos no necesitan estar conectados a la computadora de forma permanente, son muy utilizados para almacenar copias de seguridad. La memoria secundaria almacena una gran cantidad de datos aun coste menor por bit que la memoria principal, siendo habitualmente dos órdenes de magnitud más barata que la memoria primaria. Existen dos tipos de memorias de tipo semiconductor: volátiles y no volátiles. Ejemplos de memorias no volátiles son las memorias Flash (algunas veces usadas como memoria secundaria y otras veces como memoria principal) y memorias ROM/PROM/EPROM/EEPROM (usadas para firmware como programas de arranque). Ejemplos de memoria volátil son las memorias DRAM (RAM dinámica), actualmente la opción predominante a la hora de implementar la memoria principal, y las memorias SRAM (RAM estática) más rápida y costosa, utilizada para los diferentes niveles de cache. Las tecnologías de memorias no volátiles basadas en electrónica de silicio se remontan a la década de1990. Una variante de memoria de almacenaje por carga denominada como memoria Flash es mundialmente usada en productos electrónicos de consumo como telefonía móvil y reproductores de música mientras NAND Flash solid state disks(SSDs) están progresivamente desplazando a los dispositivos de disco duro como principal unidad de almacenamiento en computadoras portátiles, de escritorio e incluso en centros de datos. En la actualidad, hay varios factores que amenazan la actual predominancia de memorias semiconductoras basadas en cargas (capacitivas). Por un lado, se está alcanzando el límite de integración de las memorias Flash, lo que compromete su escalado en el medio plazo. Por otra parte, el fuerte incremento de las corrientes de fuga de los transistores de silicio CMOS actuales, supone un enorme desafío para la integración de memorias SRAM. Asimismo, estas memorias son cada vez más susceptibles a fallos de lectura/escritura en diseños de bajo consumo. Como resultado de estos problemas, que se agravan con cada nueva generación tecnológica, en los últimos años se han intensificado los esfuerzos para desarrollar nuevas tecnologías que reemplacen o al menos complementen a las actuales. Los transistores de efecto campo eléctrico ferroso (FeFET en sus siglas en inglés) se consideran una de las alternativas más prometedores para sustituir tanto a Flash (por su mayor densidad) como a DRAM (por su mayor velocidad), pero aún está en una fase muy inicial de su desarrollo. Hay otras tecnologías algo más maduras, en el ámbito de las memorias RAM resistivas, entre las que cabe destacar ReRAM (o RRAM), STT-RAM, Domain Wall Memory y Phase Change Memory (PRAM)...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    A Phase Change Memory and DRAM Based Framework For Energy-Efficient and High-Speed In-Memory Stochastic Computing

    Get PDF
    Convolutional Neural Networks (CNNs) have proven to be highly effective in various fields related to Artificial Intelligence (AI) and Machine Learning (ML). However, the significant computational and memory requirements of CNNs make their processing highly compute and memory-intensive. In particular, the multiply-accumulate (MAC) operation, which is a fundamental building block of CNNs, requires enormous arithmetic operations. As the input dataset size increases, the traditional processor-centric von-Neumann computing architecture becomes ill-suited for CNN-based applications. This results in exponentially higher latency and energy costs, making the processing of CNNs highly challenging. To overcome these challenges, researchers have explored the Processing-In Memory (PIM) technique, which involves placing the processing unit inside or near the memory unit. This approach reduces data migration length and utilizes the internal memory bandwidth at the memory chip level. However, developing a reliable PIM-based system with minimal hardware modifications and design complexity remains a significant challenge. The proposed solution in the report suggests utilizing different memory technologies, such as Dynamic RAM (DRAM) and phase change memory (PCM), with Stochastic arithmetic and minimal add-on logic. Stochastic computing is a technique that uses random numbers to perform arithmetic operations instead of traditional binary representation. This technique reduces hardware requirements for CNN\u27s arithmetic operations, making it possible to implement them with minimal add-on logic. The report details the workflow for performing arithmetical operations used by CNNs, including MAC, activation, and floating-point functions. The proposed solution includes designs for scalable Stochastic Number Generator (SNG), DRAM CNN accelerator, non-volatile memory (NVM) class PCRAM-based CNN accelerator, and DRAM-based stochastic to binary conversion (StoB) for in-situ deep learning. These designs utilize stochastic computing to reduce the hardware requirements for CNN\u27s arithmetic operations and enable energy and time-efficient processing of CNNs. The report also identifies future research directions for the proposed designs, including in-situ PCRAM-based SNG, ODIN (A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-Situ Neural Network Processing in Phase Change RAM), ATRIA (Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing), and AGNI (In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning), and presents initial findings for these ideas. In summary, the proposed solution in the report offers a comprehensive approach to address the challenges of processing CNNs, and the proposed designs have the potential to improve the energy and time efficiency of CNNs significantly. Using Stochastic Computing and different memory technologies enables the development of reliable PIM-based systems with minimal hardware modifications and design complexity, providing a promising path for the future of CNN-based applications
    corecore