2,979 research outputs found

    Balancing soft error coverage with lifetime reliability in redundantly multithreaded processors

    Get PDF
    Silicon reliability is a key challenge facing the microprocessor industry. Processors need to be designed such that they are resilient against both soft errors and lifetime reliability phenomena. However, techniques developed to address one class of reliability problems may impact other aspects of silicon reliability. In this paper, we show that Redundant Multi-Threading (RMT), which provides soft error protection, exacerbates lifetime reliability. We then explore two different architectural approaches to tackle this problem, namely, Dynamic Voltage Scaling (DVS) and partial RMT. We show that each approach has certain strengths and weaknesses with respect to performance, soft error coverage, and lifetime reliability. We then propose and evaluate a hybrid approach that combines DVS and partial RMT. We show that this approach provides better improvement in lifetime reliability than DVS or partial RMT alone, buys back a significant amount of performance that is lost due to DVS, and provides nearly complete soft error coverage. I

    Principles in Patterns (PiP) : Institutional Approaches to Curriculum Design Institutional Story

    Get PDF
    The principal outputs of the PiP Project surround the Course and Class Approval (C-CAP) system. This web-based system built on Microsoft SharePoint addresses and resolves many of the issues identified by the project. Generally well received by both academic and support staff, the system provides personalised views, adaptive forms and contextualised support for all phases of the approval process. Although the system deliberately encapsulates and facilitates existing approval processes thus achieving buy-in, it is already achieving significant improvements over the previous processes, not only in reducing the administrative overheads but also in supporting curriculum design and academic quality. The system is now embedded across three faculties and is now considered by the University of Strathclyde to be a "core institutional service". Alongside the C-CAP system the PiP Project also cultivated a suite of approaches: an incremental systems development methodology; a structured and replicable evaluation approach, and; Strathclyde's Lean Approach to Efficiencies in Education Kit (SLEEK) business process improvement methodology Each is based on recognised formal techniques, providing the basis for a rigorous approach. This is contextualised within and adapted to the HE institutional context thus building the foundation not only for the project but ultimately for institution wide process improvement. This "institutional story" report summarises the principal outcomes of the Project

    Cross layer reliability estimation for digital systems

    Get PDF
    Forthcoming manufacturing technologies hold the promise to increase multifuctional computing systems performance and functionality thanks to a remarkable growth of the device integration density. Despite the benefits introduced by this technology improvements, reliability is becoming a key challenge for the semiconductor industry. With transistor size reaching the atomic dimensions, vulnerability to unavoidable fluctuations in the manufacturing process and environmental stress rise dramatically. Failing to meet a reliability requirement may add excessive re-design cost to recover and may have severe consequences on the success of a product. %Worst-case design with large margins to guarantee reliable operation has been employed for long time. However, it is reaching a limit that makes it economically unsustainable due to its performance, area, and power cost. One of the open challenges for future technologies is building ``dependable'' systems on top of unreliable components, which will degrade and even fail during normal lifetime of the chip. Conventional design techniques are highly inefficient. They expend significant amount of energy to tolerate the device unpredictability by adding safety margins to a circuit's operating voltage, clock frequency or charge stored per bit. Unfortunately, the additional cost introduced to compensate unreliability are rapidly becoming unacceptable in today's environment where power consumption is often the limiting factor for integrated circuit performance, and energy efficiency is a top concern. Attention should be payed to tailor techniques to improve the reliability of a system on the basis of its requirements, ending up with cost-effective solutions favoring the success of the product on the market. Cross-layer reliability is one of the most promising approaches to achieve this goal. Cross-layer reliability techniques take into account the interactions between the layers composing a complex system (i.e., technology, hardware and software layers) to implement efficient cross-layer fault mitigation mechanisms. Fault tolerance mechanism are carefully implemented at different layers starting from the technology up to the software layer to carefully optimize the system by exploiting the inner capability of each layer to mask lower level faults. For this purpose, cross-layer reliability design techniques need to be complemented with cross-layer reliability evaluation tools, able to precisely assess the reliability level of a selected design early in the design cycle. Accurate and early reliability estimates would enable the exploration of the system design space and the optimization of multiple constraints such as performance, power consumption, cost and reliability. This Ph.D. thesis is devoted to the development of new methodologies and tools to evaluate and optimize the reliability of complex digital systems during the early design stages. More specifically, techniques addressing hardware accelerators (i.e., FPGAs and GPUs), microprocessors and full systems are discussed. All developed methodologies are presented in conjunction with their application to real-world use cases belonging to different computational domains

    Register File Reliability Analysis Through Cycle-Accurate Thermal Emulation

    Get PDF
    Continuous transistor scaling due to improvements in CMOS devices and manufacturing technologies is increasing processor power densities and temperatures; thus, creating challenges when trying to maintain manufacturing yield rates and devices which will be reliable throughout their lifetime. New microarchitectures require new reliability-aware design methods that can face these challenges without significantly increasing cost and performance. In this paper we present a complete analysis of reliability for the register file architecture of the Leon 3 processor. The analysis conducted is supported by the use of an accurate HW/SW FPGA-based emulation platform that enables a complete design space exploration of thermal and reliability metrics during the execution of an extended set of benchmarks, in a very limited amount of time. The effect of various compiler optimizations and register assignments on the reliability of the register file is then analyzed. Our results quantify the respective effects of these different factors and enable us to design a reliability-aware register file assignment policy that consistently improves the Mean-Time-To-Failure figure (20% on average) for the various types of applications

    Power and Reliability Management of SoCs

    Get PDF
    Today's embedded systems integrate multiple IP cores for processing, communication, and sensing on a single die as systems-on-chip (SoCs). Aggressive transistor scaling, decreased voltage margins and increased processor power and temperature have made reliability assessment a much more significant issue. Although reliability of devices and interconnect has been broadly studied, in this work, we study a tradeoff between reliability and power consumption for component-based SoC designs. We specifically focus on hard error rates as they cause a device to permanently stop operating. We also present a joint reliability and power management optimization problem whose solution is an optimal management policy. When careful joint policy optimization is performed, we obtain a significant improvement in energy consumption (40%) in tandem with meeting a reliability constraint for all SoC operating temperatures
    • 

    corecore