224 research outputs found

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

    Application-Specific Memory Subsystems

    Get PDF
    The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. These cache hierarchies are designed to be general-purpose in that they strive to provide the best possible performance across a wide range of applications. However, such a memory subsystem does not necessarily provide the best possible performance for a particular application. Although general-purpose memory subsystems are desirable when the work-load is unknown and the memory subsystem must remain fixed, when this is not the case a custom memory subsystem may be beneficial. For example, in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) designed to run a particular application, a custom memory subsystem optimized for that application would be desirable. In addition, when there are tunable parameters in the memory subsystem, it may make sense to change these parameters depending on the application being run. Such a situation arises today with FPGAs and, to a lesser extent, GPUs, and it is plausible that general-purpose computers will begin to support greater flexibility in the memory subsystem in the future. In this dissertation, we first show that it is possible to create application-specific memory subsystems that provide much better performance than a general-purpose memory subsystem. In addition, we show a way to discover such memory subsystems automatically using a superoptimization technique on memory address traces gathered from applications. This allows one to generate a custom memory subsystem with little effort. We next show that our memory subsystem superoptimization technique can be used to optimize for objectives other than performance. As an example, we show that it is possible to reduce the number of writes to the main memory, which can be useful for main memories with limited write durability, such as flash or Phase-Change Memory (PCM). Finally, we show how to superoptimize memory subsystems for streaming applications, which are a class of parallel applications. In particular, we show that, through the use of ScalaPipe, we can author and deploy streaming applications targeting FPGAs with superoptimized memory subsystems. ScalaPipe is a domain-specific language (DSL) embedded in the Scala programming language for generating streaming applications that can be implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we are able to demonstrate actual performance improvements using the superoptimized memory subsystem with applications implemented in hardware

    Bit-Flip Aware Data Structures for Phase Change Memory

    Get PDF
    Big, non-volatile, byte-addressable, low-cost, and fast non-volatile memories like Phase Change Memory are appearing in the marketplace. They have the capability to unify both memory and storage and allow us to rethink the present memory hierarchy. An important draw-back to Phase Change Memory is limited write-endurance. In addition, Phase Change Memory shares with other Non-Volatile Random Access Memories an asym- metry in the energy costs of writes and reads. Best use of Non-Volatile Random Access Memories limits the number of times a Non-Volatile Random Access Memory cell changes contents, called a bit-flip. While the future of main memory is still unknown, we should already start to create data structures for them in order to shape the future era. This thesis investigates the creation of bit-flip aware data structures.The thesis first considers general ways in which a data structure can save bit- flips by smart overwrites and by using the exclusive-or of pointers. It then shows how a simple content dependent encoding can reduce bit-flips for web corpora. It then shows how to build hash based dictionary structures for Linear Hashing and Spiral Storage. Finally, the thesis presents Gray counters, close to bit-flip optimal counters that even enable age- based wear leveling with counters managed by the Non-Volatile Random Access Memories themselves instead of by the Operating Systems

    A Modern Primer on Processing in Memory

    Full text link
    Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks: (1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data movement, especially off-chip to on-chip, is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today. At the same time, conventional memory technology is facing many technology scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of higher cost. The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different purposes (e.g., graphics, low-power, high bandwidth, low latency), and the necessity of designing new solutions to serious reliability and security issues, such as the RowHammer phenomenon, are an evidence of this trend. This chapter discusses recent research that aims to practically enable computation close to data, an approach we call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between the computation units and memory is reduced or eliminated.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0398

    2022 roadmap on neuromorphic computing and engineering

    Full text link
    Modern computation based on von Neumann architecture is now a mature cutting-edge science. In the von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018^{18} calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges for each research area. We hope that this roadmap will be a useful resource by providing a concise yet comprehensive introduction to readers outside this field, for those who are just entering the field, as well as providing future perspectives for those who are well established in the neuromorphic computing community

    Bio-inspired learning and hardware acceleration with emerging memories

    Get PDF
    Machine Learning has permeated many aspects of engineering, ranging from the Internet of Things (IoT) applications to big data analytics. While computing resources available to implement these algorithms have become more powerful, both in terms of the complexity of problems that can be solved and the overall computing speed, the huge energy costs involved remains a significant challenge. The human brain, which has evolved over millions of years, is widely accepted as the most efficient control and cognitive processing platform. Neuro-biological studies have established that information processing in the human brain relies on impulse like signals emitted by neurons called action potentials. Motivated by these facts, the Spiking Neural Networks (SNNs), which are a bio-plausible version of neural networks have been proposed as an alternative computing paradigm where the timing of spikes generated by artificial neurons is central to its learning and inference capabilities. This dissertation demonstrates the computational power of the SNNs using conventional CMOS and emerging nanoscale hardware platforms. The first half of this dissertation presents an SNN architecture which is trained using a supervised spike-based learning algorithm for the handwritten digit classification problem. This network achieves an accuracy of 98.17% on the MNIST test data-set, with about 4X fewer parameters compared to the state-of-the-art neural networks achieving over 99% accuracy. In addition, a scheme for parallelizing and speeding up the SNN simulation on a GPU platform is presented. The second half of this dissertation presents an optimal hardware design for accelerating SNN inference and training with SRAM (Static Random Access Memory) and nanoscale non-volatile memory (NVM) crossbar arrays. Three prominent NVM devices are studied for realizing hardware accelerators for SNNs: Phase Change Memory (PCM), Spin Transfer Torque RAM (STT-RAM) and Resistive RAM (RRAM). The analysis shows that a spike-based inference engine with crossbar arrays of STT-RAM bit-cells is 2X and 5X more efficient compared to PCM and RRAM memories, respectively. Furthermore, the STT-RAM design has nearly 6X higher throughput per unit Watt per unit area than that of an equivalent SRAM-based (Static Random Access Memory) design. A hardware accelerator with on-chip learning on an STT-RAM memory array is also designed, requiring 1616 bits of floating-point synaptic weight precision to reach the baseline SNN algorithmic performance on the MNIST dataset. The complete design with STT-RAM crossbar array achieves nearly 20X higher throughput per unit Watt per unit mm^2 than an equivalent design with SRAM memory. In summary, this work demonstrates the potential of spike-based neuromorphic computing algorithms and its efficient realization in hardware based on conventional CMOS as well as emerging technologies. The schemes presented here can be further extended to design spike-based systems that can be ubiquitously deployed for energy and memory constrained edge computing applications

    Non-volatile FPGA architecture using resistive switching devices

    Get PDF
    This dissertation reports the research work that was conducted to propose a non-volatile architecture for FPGA using resistive switching devices. This is achieved by designing a Configurable Memristive Logic Block (CMLB). The CMLB comprises of memristive logic cells (MLC) interconnected to each other using memristive switch matrices. In the MLC, novel memristive D flip-flop (MDFF), 6-bit non-volatile look-up table (NVLUT), and CMOS-based multiplexers are used. Other than the MDFF, a non-volatile D-latch (NVDL) was also designed. The MDFF and the NVDL are proposed to replace CMOS-based D flip-flops and D-latches to improve energy consumption. The CMLB shows a reduction of 8.6% of device area and 1.094 times lesser critical path delay against the SRAM-based FPGA architecture. Against similar CMOS-based circuits, the MDFF provides switching speed of 1.08 times faster; the NVLUT reduces power consumption by 6.25nW and improves device area by 128 transistors; while the memristive logic cells reduce overall device area by 60.416μm2. The NVLUT is constructed using novel 2TG1M memory cells, which has the fastest switching times of 12.14ns, compared to other similar memristive memory cells. This is due to the usage of transmission gates which improves voltage transfer from input to the memristor. The novel 2TG1M memory cell also has lower energy consumption than the CMOS-based 6T SRAM cell. The memristive-based switch matrices that interconnects the MLCs together comprises of novel 7T1M SRAM cells, which has the lowest energy-delay-area-product value of 1.61 among other memristive SRAM cells. Two memristive logic gates (MLG) were also designed (OR and AND), that introduces non-volatility into conventional logic gates. All the above circuits and design simulations were performed on an enhanced SPICE memristor model, which was improved from a previously published memristor model. The previously published memristor model was fault to not be in good agreement with memristor theory and the physical model of memristors. Therefore, the enhanced SPICE memristor model provides a memristor model which is in good agreement with the memristor theory and the physical model of memristors, which is used throughout this research work
    corecore