856 research outputs found

    DC: Small: Energy-aware Coordinated Caching in Cluster-based Storage Systems

    Get PDF
    The main goal of this project is to improve the performance and energy efficiency of I/O (Input/Output) operations of large-scale cluster computing platforms. The major activities include: 1) characterize the memory access workloads; 2) investigate the new and emerging new storage and memory devices, such as SSD and PCM, on I/O performance. (3) study energy-efficient buffer and cache replacement algorithms, (4) leveraging SSD as a new caching device to improve the energy efficiency and performance of I/O performanc

    DC:Small: Energy-aware Coordinated Caching in Cluster-based Storage Systems

    Get PDF
    As the computing capacity increases rapidly in large-scale cluster computing platforms, power management becomes an increasingly important concern. This project focuses on the research of reducing disk and memory power consumption through energy-aware cooperative caching in cluster-based storage systems. The project leverages I/O characteristics of scientific applications and dynamic power management features of disk drives and memory chips to reduce I/O energy consumption. This project involves three components: (1) investigate program context based pattern detection to predict I/O activities in the operating systems, (2) investigate disk energy aware cooperative cache management schemes, and (3) prototype the management schemes and incorporate into cluster-based file systems. This project has broader impact through its contributions to the energy-aware computing, graduate education, and undergraduate education via an existing NSF-REU site award

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

    Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

    Get PDF
    While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

    High-level services for networks-on-chip

    Get PDF
    Future technology trends envision that next-generation Multiprocessors Systems-on- Chip (MPSoCs) will be composed of a combination of a large number of processing and storage elements interconnected by complex communication architectures. Communication and interconnection between these basic blocks play a role of crucial importance when the number of these elements increases. Enabling reliable communication channels between cores becomes therefore a challenge for system designers. Networks-on-Chip (NoCs) appeared as a strategy for connecting and managing the communication between several design elements and IP blocks, as required in complex Systems-on-Chip (SoCs). The topic can be considered as a multidisciplinary synthesis of multiprocessing, parallel computing, networking, and on- chip communication domains. Networks-on-Chip, in addition to standard communication services, can be employed for providing support for the implementation of system-level services. This dissertation will demonstrate how high-level services can be added to an MPSoC platform by embedding appropriate hardware/software support in the network interfaces (NIs) of the NoC. In this dissertation, the implementation of innovative modules acting in parallel with protocol translation and data transmission in NIs is proposed and evaluated. The modules can support the execution of the high-level services in the NoC at a relatively low cost in terms of area and energy consumption. Three types of services will be addressed and discussed: security, monitoring, and fault tolerance. With respect to the security aspect, this dissertation will discuss the implementation of an innovative data protection mechanism for detecting and preventing illegal accesses to protected memory blocks and/or memory mapped peripherals. The second aspect will be addressed by proposing the implementation of a monitoring system based on programmable multipurpose monitoring probes aimed at detecting NoC internal events and run-time characteristics. As last topic, new architectural solutions for the design of fault tolerant network interfaces will be presented and discussed

    Doctor of Philosophy

    Get PDF
    dissertationThe internet-based information infrastructure that has powered the growth of modern personal/mobile computing is composed of powerful, warehouse-scale computers or datacenters. These heavily subscribed datacenters perform data-processing jobs under intense quality of service guarantees. Further, high-performance compute platforms are being used to model and analyze increasingly complex scientific problems and natural phenomena. To ensure that the high-performance needs of these machines are met, it is necessary to increase the efficiency of the memory system that supplies data to the processing cores. Many of the microarchitectural innovations that were designed to scale the memory wall (e.g., out-of-order instruction execution, on-chip caches) are being rendered less effective due to several emerging trends (e.g., increased emphasis on energy consumption, limited access locality). This motivates the optimization of the main memory system itself. The key to an efficient main memory system is the memory controller. In particular, the scheduling algorithm in the memory controller greatly influences its performance. This dissertation explores this hypothesis in several contexts. It develops tools to better understand memory scheduling and develops scheduling innovations for CPUs and GPUs. We propose novel memory scheduling techniques that are strongly aware of the access patterns of the clients as well as the microarchitecture of the memory device. Based on these, we present (i) a Dynamic Random Access Memory (DRAM) chip microarchitecture optimized for reducing write-induced slowdown, (ii) a memory scheduling algorithm that exploits these features, (iii) several memory scheduling algorithms to reduce the memory-related stall experienced by irregular General Purpose Graphics Processing Unit (GPGPU) applications, and (iv) the Utah Simulated Memory Module (USIMM), a detailed, validated simulator for DRAM main memory that we use for analyzing and proposing scheduler algorithms

    Improving Caches in Consolidated Environments

    Get PDF
    Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer’s processor. In order to maximize performance, the speeds of the memory and the processor should be equal. However, using memory that always match the speed of the processor is prohibitively expensive. Computer hardware designers have managed to drastically lower the cost of the system with the use of memory caches by sacrificing some performance. A cache is a small piece of fast memory that stores popular data so it can be accessed faster. Modern computers have evolved into a hierarchy of caches, where a memory level is the cache for a larger and slower memory level immediately below it. Thus, by using caches, manufacturers are able to store terabytes of data at the cost of cheapest memory while achieving speeds close to the speed of the fastest one. The most important decision about managing a cache is what data to store in it. Failing to make good decisions can lead to performance overheads and over- provisioning. Surprisingly, caches choose data to store based on policies that have not changed in principle for decades. However, computing paradigms have changed radically leading to two noticeably different trends. First, caches are now consol- idated across hundreds to even thousands of processes. And second, caching is being employed at new levels of the storage hierarchy due to the availability of high-performance flash-based persistent media. This brings four problems. First, as the workloads sharing a cache increase, it is more likely that they contain dupli- cated data. Second, consolidation creates contention for caches, and if not managed carefully, it translates to wasted space and sub-optimal performance. Third, as contented caches are shared by more workloads, administrators need to carefully estimate specific per-workload requirements across the entire memory hierarchy in order to meet per-workload performance goals. And finally, current cache write poli- cies are unable to simultaneously provide performance and consistency guarantees for the new levels of the storage hierarchy. We addressed these problems by modeling their impact and by proposing solu- tions for each of them. First, we measured and modeled the amount of duplication at the buffer cache level and contention in real production systems. Second, we created a unified model of workload cache usage under contention to be used by administrators for provisioning, or by process schedulers to decide what processes to run together. Third, we proposed methods for removing cache duplication and to eliminate wasted space because of contention for space. And finally, we pro- posed a technique to improve the consistency guarantees of write-back caches while preserving their performance benefits
    • …
    corecore