5 research outputs found

    CONTINUOUS ONLINE MEMORY DIAGNOSTIC

    Get PDF
    Today’s computers have gigabytes of main memory due to improved DRAM density. As density increases, smaller bit cells become more susceptible to errors. With an increase in error susceptibility, the need for memory resiliency also increases. Self-testing of memory health can proactively check for errors to improve resiliency. Developing a memory diagnostic is challenging due to requirements for transparency, scalability and low performance overheads. In my thesis, I developed a software-only self-test to continuously test memory. I present the challenges and the design for two approaches, called COMeT and Asteroid, that are built on a common software framework for memory diagnostic and target chip multiprocessors. COMeT tests memory health simultaneously with single-threaded and multi-threaded application execution in anticipation of memory allocation requests. The approach guarantees that memory is tested within a fixed time interval to limit exposure to lurking errors. On the SPEC CPU2006 and the PARSEC benchmarks, COMeT has a low 4% average performance overhead. Despite the promising results, COMeT showed poor scalability on multi-programmed workload environment with high memory pressure. I developed another novel approach, Asteroid, which can adapt at runtime to workload behavior and resource availability to maximize test quality while reducing performance overhead. Asteroid is designed to support control policies to dynamically configure a diagnostic. Asteroid is seamlessly integrated with a hierarchical memory allocator in modern operating systems and is optimized to achieve higher memory test speed than COMeT. Using an adaptive policy, in a 16-core server, Asteroid has modest overhead of 1% to 4% for workloads with low to high memory demand. For these workloads, Asteroid’s adaptive policy shows good error coverage and can thoroughly test memory. Thorough evaluation of my techniques provides experimental justification that a transparent and online software-based strategy for memory diagnostic can be achievable by utilizing over-provisioned system resources

    IMPROVING THE PERFORMANCE AND ENERGY EFFICIENCY OF EMERGING MEMORY SYSTEMS

    Get PDF
    Modern main memory is primarily built using dynamic random access memory (DRAM) chips. As DRAM chip scales to higher density, there are mainly three problems that impede DRAM scalability and performance improvement. First, DRAM refresh overhead grows from negligible to severe, which limits DRAM scalability and causes performance degradation. Second, although memory capacity has increased dramatically in past decade, memory bandwidth has not kept pace with CPU performance scaling, which has led to the memory wall problem. Third, DRAM dissipates considerable power and has been reported to account for as much as 40% of the total system energy and this problem exacerbates as DRAM scales up. To address these problems, 1) we propose Rank-level Piggyback Caching (RPC) to alleviate DRAM refresh overhead by servicing memory requests and refresh operations in parallel; 2) we propose a high performance and bandwidth efficient approach, called SELF, to breaking the memory bandwidth wall by exploiting die-stacked DRAM as a part of memory; 3) we propose a cost-effective and energy-efficient architecture for hybrid memory systems composed of high bandwidth memory (HBM) and phase change memory (PCM), called Dual Role HBM (DR-HBM). In DR-HBM, hot pages are tracked at a cost-effective way and migrated to the HBM to improve performance, while cold pages are stored at the PCM to save energy

    TLB Shootdown Mitigation for Low-Power Many-Core Servers with L1 Virtual Caches

    No full text
    corecore