11 research outputs found

    Data-intensive Systems on Modern Hardware : Leveraging Near-Data Processing to Counter the Growth of Data

    Get PDF
    Over the last decades, a tremendous change toward using information technology in almost every daily routine of our lives can be perceived in our society, entailing an incredible growth of data collected day-by-day on Web, IoT, and AI applications. At the same time, magneto-mechanical HDDs are being replaced by semiconductor storage such as SSDs, equipped with modern Non-Volatile Memories, like Flash, which yield significantly faster access latencies and higher levels of parallelism. Likewise, the execution speed of processing units increased considerably as nowadays server architectures comprise up to multiple hundreds of independently working CPU cores along with a variety of specialized computing co-processors such as GPUs or FPGAs. However, the burden of moving the continuously growing data to the best fitting processing unit is inherently linked to today’s computer architecture that is based on the data-to-code paradigm. In the light of Amdahl's Law, this leads to the conclusion that even with today's powerful processing units, the speedup of systems is limited since the fraction of parallel work is largely I/O-bound. Therefore, throughout this cumulative dissertation, we investigate the paradigm shift toward code-to-data, formally known as Near-Data Processing (NDP), which relieves the contention on the I/O bus by offloading processing to intelligent computational storage devices, where the data is originally located. Firstly, we identified Native Storage Management as the essential foundation for NDP due to its direct control of physical storage management within the database. Upon this, the interface is extended to propagate address mapping information and to invoke NDP functionality on the storage device. As the former can become very large, we introduce Physical Page Pointers as one novel NDP abstraction for self-contained immutable database objects. Secondly, the on-device navigation and interpretation of data are elaborated. Therefore, we introduce cross-layer Parsers and Accessors as another NDP abstraction that can be executed on the heterogeneous processing capabilities of modern computational storage devices. Thereby, the compute placement and resource configuration per NDP request is identified as a major performance criteria. Our experimental evaluation shows an improvement in the execution durations of 1.4x to 2.7x compared to traditional systems. Moreover, we propose a framework for the automatic generation of Parsers and Accessors on FPGAs to ease their application in NDP. Thirdly, we investigate the interplay of NDP and modern workload characteristics like HTAP. Therefore, we present different offloading models and focus on an intervention-free execution. By propagating the Shared State with the latest modifications of the database to the computational storage device, it is able to process data with transactional guarantees. Thus, we achieve to extend the design space of HTAP with NDP by providing a solution that optimizes for performance isolation, data freshness, and the reduction of data transfers. In contrast to traditional systems, we experience no significant drop in performance when an OLAP query is invoked but a steady and 30% faster throughput. Lastly, in-situ result-set management and consumption as well as NDP pipelines are proposed to achieve flexibility in processing data on heterogeneous hardware. As those produce final and intermediary results, we continue investigating their management and identified that an on-device materialization comes at a low cost but enables novel consumption modes and reuse semantics. Thereby, we achieve significant performance improvements of up to 400x by reusing once materialized results multiple times

    Energy-efficient architectures for chip-scale networks and memory systems using silicon-photonics technology

    Full text link
    Today's supercomputers and cloud systems run many data-centric applications such as machine learning, graph algorithms, and cognitive processing, which have large data footprints and complex data access patterns. With computational capacity of large-scale systems projected to rise up to 50GFLOPS/W, the target energy-per-bit budget for data movement is expected to reach as low as 0.1pJ/bit, assuming 200bits/FLOP for data transfers. This tight energy budget impacts the design of both chip-scale networks and main memory systems. Conventional electrical links used in chip-scale networks (0.5-3pJ/bit) and DRAM systems used in main memory (>30pJ/bit) fail to provide sustained performance at low energy budgets. This thesis builds on the promising research on silicon-photonic technology to design system architectures and system management policies for chip-scale networks and main memory systems. The adoption of silicon-photonic links as chip-scale networks, however, is hampered by the high sensitivity of optical devices towards thermal and process variations. These device sensitivities result in high power overheads at high-speed communications. Moreover, applications differ in their resource utilization, resulting in application-specific thermal profiles and bandwidth needs. Similarly, optically-controlled memory systems designed using conventional electrical-based architectures require additional circuitry for electrical-to-optical and optical-to-electrical conversions within memory. These conversions increase the energy and latency per memory access. Due to these issues, chip-scale networks and memory systems designed using silicon-photonics technology leave much of their benefits underutilized. This thesis argues for the need to rearchitect memory systems and redesign network management policies such that they are aware of the application variability and the underlying device characteristics of silicon-photonic technology. We claim that such a cross-layer design enables a high-throughput and energy-efficient unified silicon-photonic link and main memory system. This thesis undertakes the cross-layer design with silicon-photonic technology in two fronts. First, we study the varying network bandwidth requirements across different applications and also within a given application. To address this variability, we develop bandwidth allocation policies that account for application needs and device sensitivities to ensure power-efficient operation of silicon-photonic links. Second, we design a novel architecture of an optically-controlled main memory system that is directly interfaced with silicon-photonic links using a novel read and write access protocol. Such a system ensures low-energy and high-throughput access from the processor to a high-density memory. To further address the diversity in application memory characteristics, we explore heterogeneous memory systems with multiple memory modules that provide varied power-performance benefits. We design a memory management policy for such systems that allocates pages at the granularity of memory objects within an application

    ENERGY-AWARE OPTIMIZATION FOR EMBEDDED SYSTEMS WITH CHIP MULTIPROCESSOR AND PHASE-CHANGE MEMORY

    Get PDF
    Over the last two decades, functions of the embedded systems have evolved from simple real-time control and monitoring to more complicated services. Embedded systems equipped with powerful chips can provide the performance that computationally demanding information processing applications need. However, due to the power issue, the easy way to gain increasing performance by scaling up chip frequencies is no longer feasible. Recently, low-power architecture designs have been the main trend in embedded system designs. In this dissertation, we present our approaches to attack the energy-related issues in embedded system designs, such as thermal issues in the 3D chip multiprocessor (CMP), the endurance issue in the phase-change memory(PCM), the battery issue in the embedded system designs, the impact of inaccurate information in embedded system, and the cloud computing to move the workload to remote cloud computing facilities. We propose a real-time constrained task scheduling method to reduce peak temperature on a 3D CMP, including an online 3D CMP temperature prediction model and a set of algorithm for scheduling tasks to different cores in order to minimize the peak temperature on chip. To address the challenging issues in applying PCM in embedded systems, we propose a PCM main memory optimization mechanism through the utilization of the scratch pad memory (SPM). Furthermore, we propose an MLC/SLC configuration optimization algorithm to enhance the efficiency of the hybrid DRAM + PCM memory. We also propose an energy-aware task scheduling algorithm for parallel computing in mobile systems powered by batteries. When scheduling tasks in embedded systems, we make the scheduling decisions based on information, such as estimated execution time of tasks. Therefore, we design an evaluation method for impacts of inaccurate information on the resource allocation in embedded systems. Finally, in order to move workload from embedded systems to remote cloud computing facility, we present a resource optimization mechanism in heterogeneous federated multi-cloud systems. And we also propose two online dynamic algorithms for resource allocation and task scheduling. We consider the resource contention in the task scheduling

    Development of phase change memory cell electrical circuit model for non-volatile multistate memory device

    Get PDF
    Phase change memory (PCM) is an emerging non-volatile memory technology that demonstrates promising performance characteristics. The presented research aims to study the feasibility of using resistive non-volatile PCM in embedded memory applications, and in bridging the performance gap in traditional memory hierarchy between volatile and non-volatile memories. The research studies the operation dynamics of PCM, including its electrical, thermal and physical properties; in order to determine its behaviour. A PCM cell circuit model is designed and simulated with the aid of SPICE tools (LTSPICE IV). The first step in the modelling process was to design a single-level PCM (SLPCM) cell circuit model that stores a single bit of data. To design the PCM circuit model; crystallization theory and heat transfer equation were utilized. The developed electrical circuit model evaluates the physical transformations that a PCM cell undergoes in response to an input pulse. Furthermore, the developed model accurately simulated the temperature profile, the crystalline fraction, and the resistance of the cell as a function of the programming pulse. The circuit model is then upgraded into a multilevel phase change memory (MLPCM) cell circuit model. The upgraded MLPCM circuit model stores two bits of data, and incorporates resistance drift with time. The multiple resistance levels were achieved by controlling the programming pulse width in the range of 10ns to 200ns. Additionally, the drift behaviour was precisely evaluated; by using statistical data of drift exponents, and evaluating the exact drift duration. Moreover, the simulation results for the designed SLPCM and MLPCM cell models were found to be in close agreement with experimental data. The simulated I-V characteristics for both SLPCM and MLPCM mimicked the experimentally produced I-V curves. Furthermore, the simulated drift resistance levels matched the experimental data for drift durations up to 103 seconds; which is the available experimental data duration in technical literature. Furthermore, the simulation results of MLPCM showed that the deviation between the programmed and drifted resistance can reach 6x106Ω in less than 1010 seconds. This resistance deviation leads to reading failures in less than 100 seconds after programming, if standard fixed sensing thresholds method was used. Therefore, to overcome drift reliability issues, and retain the density advantage offered by multilevel operation; a time-aware sensing scheme is developed. The designed sensing scheme compensates for the drift caused resistance deviation; by using statistical data of drift coefficients to forecast adaptive sensing thresholds. The simulation results showed that the use of adaptive time-aware sensing thresholds completely eliminated drift reliability issues and read errors. Furthermore, PCM based nanocrossbar memory structure performance in terms of delay and energy consumption is studied in simulation environment. The nanocrossbar is constructed with a grid of connecting wires; and the designed PCM cell circuit model is used as memory element and placed at junction points of the grid. Then the effect of connecting nanowires resistance in PCM nanocrossbar performance is studied in passive crossbars. The resistance of a connecting wire segment was evaluated with physical formulas that calculate nanoscaled conductors’ resistance. Then a resistor that is equivalent to each wire segment resistance is placed in the tested crossbar structure. Simulation results showed that due to connecting wires resistance; the PCM cells are not truly biased to programming voltage and ground. This leads to 40% deviation in the programed low resistive state from the targeted levels. Thus, affecting PCM reliability and decreasing the high to low resistance ratio by 90%. Therefore, programming and architectural solutions to wire resistance related reliability issue ar presented. Where dissipated power across wire resistance is compensated for; by controlling programming pulse duration. The programming solution retained reliability however; it increased programming energy consumption and delay by an average of 40pJ and 60ns respectively per operation. Additionally, the effects of leakage energy in PCM based nanocrossbars were studied in simulation environment. Then, a structural solution was developed and designed. In the designed structure; leakage sneak paths are eliminated by introducing individual word lines to each memory element. This method led to 30% reduction in reading delay, and consumed only about sixth the leakage energy consumed by the standard structure. Moreover, a sensing scheme that aims to reduce energy consumption in PCM based nanocrossbars during reading process was explored. The sensing method is developed using AC current in contrast to the standard DC current reading circuits. In the designed sensing circuit, a low pass filter is utilized. Accordingly, the filter attenuation of the applied AC reading signal indicates the stored state. The proposed circuit design of the AC sensing scheme was constructed and studied in simulation environment. Simulation results showed that AC sensing has reduced reading energy consumption by over 50%; compared to standard DC sensing scheme. Furthermore, the use of SLPCM and MLPCM in memory applications as crossbar memory elements, and in logic applications i.e. PCM based LUTs was explored and tested in simulation environment. The PCM performance in crossbar memory was then compared to current Static Random Access Memory (SRAM) technology and against one of the main emerging resistive non-volatile memory technologies i.e. Memristors. Simulation results showed that programming and reading energy consumption of PCM based crossbars were five orders of magnitude more than SRAM based crossbars. And reading delay of SRAM based crossbars was only 38% of reading delay of PCM based counterparts. However, PCM cells occupies less than 60% of the area required by SRAM and can store multiple bit in a single cell. Moreover, Memristor based nanocrossbars outperformed PCM based ones; in terms of delay and energy consumption. With PCM consuming 2 orders of magnitude more energy during programming and reading. PCM also required 10 times the programming delay. However, PCM crossbars offered higher switching resistance range i.e. 170kΩ compared to the 20kΩ offered by memristors; which support PCM multibit storage capability and higher density

    Development of phase change memory cell electrical circuit model for non-volatile multistate memory device

    Get PDF
    Phase change memory (PCM) is an emerging non-volatile memory technology that demonstrates promising performance characteristics. The presented research aims to study the feasibility of using resistive non-volatile PCM in embedded memory applications, and in bridging the performance gap in traditional memory hierarchy between volatile and non-volatile memories. The research studies the operation dynamics of PCM, including its electrical, thermal and physical properties; in order to determine its behaviour. A PCM cell circuit model is designed and simulated with the aid of SPICE tools (LTSPICE IV). The first step in the modelling process was to design a single-level PCM (SLPCM) cell circuit model that stores a single bit of data. To design the PCM circuit model; crystallization theory and heat transfer equation were utilized. The developed electrical circuit model evaluates the physical transformations that a PCM cell undergoes in response to an input pulse. Furthermore, the developed model accurately simulated the temperature profile, the crystalline fraction, and the resistance of the cell as a function of the programming pulse. The circuit model is then upgraded into a multilevel phase change memory (MLPCM) cell circuit model. The upgraded MLPCM circuit model stores two bits of data, and incorporates resistance drift with time. The multiple resistance levels were achieved by controlling the programming pulse width in the range of 10ns to 200ns. Additionally, the drift behaviour was precisely evaluated; by using statistical data of drift exponents, and evaluating the exact drift duration. Moreover, the simulation results for the designed SLPCM and MLPCM cell models were found to be in close agreement with experimental data. The simulated I-V characteristics for both SLPCM and MLPCM mimicked the experimentally produced I-V curves. Furthermore, the simulated drift resistance levels matched the experimental data for drift durations up to 103 seconds; which is the available experimental data duration in technical literature. Furthermore, the simulation results of MLPCM showed that the deviation between the programmed and drifted resistance can reach 6x106Ω in less than 1010 seconds. This resistance deviation leads to reading failures in less than 100 seconds after programming, if standard fixed sensing thresholds method was used. Therefore, to overcome drift reliability issues, and retain the density advantage offered by multilevel operation; a time-aware sensing scheme is developed. The designed sensing scheme compensates for the drift caused resistance deviation; by using statistical data of drift coefficients to forecast adaptive sensing thresholds. The simulation results showed that the use of adaptive time-aware sensing thresholds completely eliminated drift reliability issues and read errors. Furthermore, PCM based nanocrossbar memory structure performance in terms of delay and energy consumption is studied in simulation environment. The nanocrossbar is constructed with a grid of connecting wires; and the designed PCM cell circuit model is used as memory element and placed at junction points of the grid. Then the effect of connecting nanowires resistance in PCM nanocrossbar performance is studied in passive crossbars. The resistance of a connecting wire segment was evaluated with physical formulas that calculate nanoscaled conductors’ resistance. Then a resistor that is equivalent to each wire segment resistance is placed in the tested crossbar structure. Simulation results showed that due to connecting wires resistance; the PCM cells are not truly biased to programming voltage and ground. This leads to 40% deviation in the programed low resistive state from the targeted levels. Thus, affecting PCM reliability and decreasing the high to low resistance ratio by 90%. Therefore, programming and architectural solutions to wire resistance related reliability issue ar presented. Where dissipated power across wire resistance is compensated for; by controlling programming pulse duration. The programming solution retained reliability however; it increased programming energy consumption and delay by an average of 40pJ and 60ns respectively per operation. Additionally, the effects of leakage energy in PCM based nanocrossbars were studied in simulation environment. Then, a structural solution was developed and designed. In the designed structure; leakage sneak paths are eliminated by introducing individual word lines to each memory element. This method led to 30% reduction in reading delay, and consumed only about sixth the leakage energy consumed by the standard structure. Moreover, a sensing scheme that aims to reduce energy consumption in PCM based nanocrossbars during reading process was explored. The sensing method is developed using AC current in contrast to the standard DC current reading circuits. In the designed sensing circuit, a low pass filter is utilized. Accordingly, the filter attenuation of the applied AC reading signal indicates the stored state. The proposed circuit design of the AC sensing scheme was constructed and studied in simulation environment. Simulation results showed that AC sensing has reduced reading energy consumption by over 50%; compared to standard DC sensing scheme. Furthermore, the use of SLPCM and MLPCM in memory applications as crossbar memory elements, and in logic applications i.e. PCM based LUTs was explored and tested in simulation environment. The PCM performance in crossbar memory was then compared to current Static Random Access Memory (SRAM) technology and against one of the main emerging resistive non-volatile memory technologies i.e. Memristors. Simulation results showed that programming and reading energy consumption of PCM based crossbars were five orders of magnitude more than SRAM based crossbars. And reading delay of SRAM based crossbars was only 38% of reading delay of PCM based counterparts. However, PCM cells occupies less than 60% of the area required by SRAM and can store multiple bit in a single cell. Moreover, Memristor based nanocrossbars outperformed PCM based ones; in terms of delay and energy consumption. With PCM consuming 2 orders of magnitude more energy during programming and reading. PCM also required 10 times the programming delay. However, PCM crossbars offered higher switching resistance range i.e. 170kΩ compared to the 20kΩ offered by memristors; which support PCM multibit storage capability and higher density

    Flash Memory Devices

    Get PDF
    Flash memory devices have represented a breakthrough in storage since their inception in the mid-1980s, and innovation is still ongoing. The peculiarity of such technology is an inherent flexibility in terms of performance and integration density according to the architecture devised for integration. The NOR Flash technology is still the workhorse of many code storage applications in the embedded world, ranging from microcontrollers for automotive environment to IoT smart devices. Their usage is also forecasted to be fundamental in emerging AI edge scenario. On the contrary, when massive data storage is required, NAND Flash memories are necessary to have in a system. You can find NAND Flash in USB sticks, cards, but most of all in Solid-State Drives (SSDs). Since SSDs are extremely demanding in terms of storage capacity, they fueled a new wave of innovation, namely the 3D architecture. Today “3D” means that multiple layers of memory cells are manufactured within the same piece of silicon, easily reaching a terabit capacity. So far, Flash architectures have always been based on "floating gate," where the information is stored by injecting electrons in a piece of polysilicon surrounded by oxide. On the contrary, emerging concepts are based on "charge trap" cells. In summary, flash memory devices represent the largest landscape of storage devices, and we expect more advancements in the coming years. This will require a lot of innovation in process technology, materials, circuit design, flash management algorithms, Error Correction Code and, finally, system co-design for new applications such as AI and security enforcement
    corecore