67 research outputs found

    Balancing soft error coverage with lifetime reliability in redundantly multithreaded processors

    Get PDF
    Silicon reliability is a key challenge facing the microprocessor industry. Processors need to be designed such that they are resilient against both soft errors and lifetime reliability phenomena. However, techniques developed to address one class of reliability problems may impact other aspects of silicon reliability. In this paper, we show that Redundant Multi-Threading (RMT), which provides soft error protection, exacerbates lifetime reliability. We then explore two different architectural approaches to tackle this problem, namely, Dynamic Voltage Scaling (DVS) and partial RMT. We show that each approach has certain strengths and weaknesses with respect to performance, soft error coverage, and lifetime reliability. We then propose and evaluate a hybrid approach that combines DVS and partial RMT. We show that this approach provides better improvement in lifetime reliability than DVS or partial RMT alone, buys back a significant amount of performance that is lost due to DVS, and provides nearly complete soft error coverage. I

    An Estimator for the Sensitivity to Perturbations of Deep Neural Networks

    Full text link
    For Deep Neural Networks (DNNs) to become useful in safety-critical applications, such as self-driving cars and disease diagnosis, they must be stable to perturbations in input and model parameters. Characterizing the sensitivity of a DNN to perturbations is necessary to determine minimal bit-width precision that may be used to safely represent the network. However, no general result exists that is capable of predicting the sensitivity of a given DNN to round-off error, noise, or other perturbations in input. This paper derives an estimator that can predict such quantities. The estimator is derived via inequalities and matrix norms, and the resulting quantity is roughly analogous to a condition number for the entire neural network. An approximation of the estimator is tested on two Convolutional Neural Networks, AlexNet and VGG-19, using the ImageNet dataset. For each of these networks, the tightness of the estimator is explored via random perturbations and adversarial attacks.Comment: Actual work and paper concluded in January 201

    Recovery Boosting: A Technique to Enhance NBTI Recovery in SRAM Arrays

    No full text
    an important lifetime reliability problem in microprocessors. SRAM-based structures within the processor are especially susceptible to NBTI since one of the PMOS devices in the memory cell always has an input of ‘0’. Previously proposed recovery techniques for SRAM cells aim to balance the degradation of the two PMOS devices by attempting to keep their inputs at a logic ‘0 ’ exactly 50 % of the time. However, one of the devices is always in the negative bias condition at any given time. In this paper, we propose a technique called Recovery Boosting that allows both PMOS devices in the memory cell to be put into the recovery mode by slightly modifying the design of conventional SRAM cells. We present the circuit-level design of an issue queue that uses such cells and perform SPICElevel simulations to verify its functionality and quantify area and power consumption. We then conduct an architecturelevel evaluation of the performance and reliability of using an area-neutral design of such an issue queue using the M5 simulator and the SPEC CPU2000 benchmark suite. We show that recovery boosting provides a 56 % improvement in the static noise margin for the issue queue while having very little impact on power consumption and a negligible loss in performance. I

    Understanding the Performance-Temperature Interactions in Disk I/O of Server Workloads

    No full text
    This paper describes the first infrastructure for integrated studies of the performance and thermal behavior of storage systems. Using microbenchmarks running on this infrastructure, we first gain insight into how I/O characteristics can affect the temperature of disk drives. We use this analysis to identify the most promising, yet simple, “knobs ” for temperature optimization of high speed disks, which can be implemented on existing disks. We then analyze the thermal profiles of real workloads that use such disk drives in their storage systems, pointing out which knobs are most useful for dynamic thermal management when pushing the performance envelope
    corecore