12,707 research outputs found

    Affordable techniques for dependable microprocessor design

    Get PDF
    As high computing power is available at an affordable cost, we rely on microprocessor-based systems for much greater variety of applications. This dependence indicates that a processor failure could have more diverse impacts on our daily lives. Therefore, dependability is becoming an increasingly important quality measure of microprocessors.;Temporary hardware malfunctions caused by unstable environmental conditions can lead the processor to an incorrect state. This is referred to as a transient error or soft error. Studies have shown that soft errors are the major source of system failures. This dissertation characterizes the soft error behavior on microprocessors and presents new microarchitectural approaches that can realize high dependability with low overhead.;Our fault injection studies using RISC processors have demonstrated that different functional blocks of the processor have distinct susceptibilities to soft errors. The error susceptibility information must be reflected in devising fault tolerance schemes for cost-sensitive applications. Considering the common use of on-chip caches in modern processors, we investigated area-efficient protection schemes for memory arrays. The idea of caching redundant information was exploited to optimize resource utilization for increased dependability. We also developed a mechanism to verify the integrity of data transfer from lower level memories to the primary caches. The results of this study show that by exploiting bus idle cycles and the information redundancy, an almost complete check for the initial memory data transfer is possible without incurring a performance penalty.;For protecting the processor\u27s control logic, which usually remains unprotected, we propose a low-cost reliability enhancement strategy. We classified control logic signals into static and dynamic control depending on their changeability, and applied various techniques including commit-time checking, signature caching, component-level duplication, and control flow monitoring. Our schemes can achieve more than 99% coverage with a very small hardware addition. Finally, a virtual duplex architecture for superscalar processors is presented. In this system-level approach, the processor pipeline is backed up by a partially replicated pipeline. The replication-based checker minimizes the design and verification overheads. For a large-scale superscalar processor, the proposed architecture can bring 61.4% reduction in die area while sustaining the maximum performance

    Analyzing and Predicting Processor Vulnerability to Soft Errors Using Statistical Techniques

    Get PDF
    The shrinking processor feature size, lower threshold voltage and increasing on-chip transistor density make current processors highly vulnerable to soft errors. Architectural Vulnerability Factor (AVF) reflects the probability that a raw soft error eventually causes a visible error in the program output, indicating the processor’s susceptibility to soft errors at architectural level. The awareness of the AVF, both at the early design stage and during program runtime, is greatly useful for designing reliable processors. However, measuring the AVF is extremely costly, resulting in large overheads in hardware, computation, and power. The situation is further exacerbated in a multi-threaded processor environment where resource contention and data sharing exist among different threads. Consequently, predicting the AVF from other easily-measured metrics becomes extraordinarily attractive to computer designers. We propose a series of AVF modeling and prediction works via using advanced statistical techniques. First, we utilize the Boosted Regression Trees (BRT) scheme to dynamically predict the AVF during program execution from a variety of performance metrics. This correlation is generalized to be across different workloads, program phases, and processor configurations on a single-threaded superscalar processor. Second, the AVF prediction is extended to multi-threaded processors where the inter-thread resource contention shows significant and non-uniform impacts on different programs; we propose a two-level predictive mechanism using BRT as building blocks to characterize the contention behavior. Finally, we employ a rule search strategy named Patient Rule Induction Method (PRIM) to explore a large processor design space at the early design stage. We are capable of generating selective rules on important configuration parameters. These rules quantify the design space subregion yielding lowest values of the response, thereby providing useful guidelines for designing reliable processors while achieving high performance

    PROFET: modeling system performance and energy without simulating the CPU

    Get PDF
    The approaching end of DRAM scaling and expansion of emerging memory technologies is motivating a lot of research in future memory systems. Novel memory systems are typically explored by hardware simulators that are slow and often have a simplified or obsolete abstraction of the CPU. This study presents PROFET, an analytical model that predicts how an application's performance and energy consumption changes when it is executed on different memory systems. The model is based on instrumentation of an application execution on actual hardware, so it already takes into account CPU microarchitectural details such as the data prefetcher and out-of-order engine. PROFET is evaluated on two real platforms: Sandy Bridge-EP E5-2670 and Knights Landing Xeon Phi platforms with various memory configurations. The evaluation results show that PROFET's predictions are accurate, typically with only 2% difference from the values measured on actual hardware. We release the PROFET source code and all input data required for memory system and application profiling. The released package can be seamlessly installed and used on high-end Intel platforms.Peer ReviewedPostprint (author's final draft

    Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

    Full text link
    Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS's page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium on Memory Managemen

    Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

    Get PDF
    While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

    The 1990 progress report and future plans

    Get PDF
    This document describes the progress and plans of the Artificial Intelligence Research Branch (RIA) at ARC in 1990. Activities span a range from basic scientific research to engineering development and to fielded NASA applications, particularly those applications that are enabled by basic research carried out at RIA. Work is conducted in-house and through collaborative partners in academia and industry. Our major focus is on a limited number of research themes with a dual commitment to technical excellence and proven applicability to NASA short, medium, and long-term problems. RIA acts as the Agency's lead organization for research aspects of artificial intelligence, working closely with a second research laboratory at JPL and AI applications groups at all NASA centers

    Dependable Computing on Inexact Hardware through Anomaly Detection.

    Full text link
    Reliability of transistors is on the decline as transistors continue to shrink in size. Aggressive voltage scaling is making the problem even worse. Scaled-down transistors are more susceptible to transient faults as well as permanent in-field hardware failures. In order to continue to reap the benefits of technology scaling, it has become imperative to tackle the challenges risen due to the decreasing reliability of devices for the mainstream commodity market. Along with the worsening reliability, achieving energy efficiency and performance improvement by scaling is increasingly providing diminishing marginal returns. More than any other time in history, the semiconductor industry faces the crossroad of unreliability and the need to improve energy efficiency. These challenges of technology scaling can be tackled by categorizing the target applications in the following two categories: traditional applications that have relatively strict correctness requirement on outputs and emerging class of soft applications, from various domains such as multimedia, machine learning, and computer vision, that are inherently inaccuracy tolerant to a certain degree. Traditional applications can be protected against hardware failures by low-cost detection and protection methods while soft applications can trade off quality of outputs to achieve better performance or energy efficiency. For traditional applications, I propose an efficient, software-only application analysis and transformation solution to detect data and control flow transient faults. The intelligence of the data flow solution lies in the use of dynamic application information such as control flow, memory and value profiling. The control flow protection technique achieves its efficiency by simplifying signature calculations in each basic block and by performing checking at a coarse-grain level. For soft applications, I develop a quality control technique. The quality control technique employs continuous, light-weight checkers to ensure that the approximation is controlled and application output is acceptable. Overall, I show that the use of low-cost checkers to produce dependable results on commodity systems---constructed from inexact hardware components---is efficient and practical.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113341/1/dskhudia_1.pd

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design
    • …
    corecore