6 research outputs found

    Approaches to multiprocessor error recovery using an on-chip interconnect subsystem

    Get PDF
    For future multicores, a dedicated interconnect subsystem for on-chip monitors was found to be highly beneficial in terms of scalability, performance and area. In this thesis, such a monitor network (MNoC) is used for multicores to support selective error identification and recovery and maintain target chip reliability in the context of dynamic voltage and frequency scaling (DVFS). A selective shared memory multiprocessor recovery is performed using MNoC in which, when an error is detected, only the group of processors sharing an application with the affected processors are recovered. Although the use of DVFS in contemporary multicores provides significant protection from unpredictable thermal events, a potential side effect can be an increased processor exposure to soft errors. To address this issue, a flexible fault prevention and recovery mechanism has been developed to selectively enable a small amount of per-core dual modular redundancy (DMR) in response to increased vulnerability, as measured by the processor architectural vulnerability factor (AVF). Our new algorithm for DMR deployment aims to provide a stable effective soft error rate (SER) by using DMR in response to DVFS caused by thermal events. The algorithm is implemented in real-time on the multicore using MNoC and controller which evaluates thermal information and multicore performance statistics in addition to error information. DVFS experiments with a multicore simulator using standard benchmarks show an average 6% improvement in overall power consumption and a stable SER by using selective DMR versus continuous DMR deployment

    Cross-layer reliability evaluation, moving from the hardware architecture to the system level: A CLERECO EU project overview

    Get PDF
    Advanced computing systems realized in forthcoming technologies hold the promise of a significant increase of computational capabilities. However, the same path that is leading technologies toward these remarkable achievements is also making electronic devices increasingly unreliable. Developing new methods to evaluate the reliability of these systems in an early design stage has the potential to save costs, produce optimized designs and have a positive impact on the product time-to-market. CLERECO European FP7 research project addresses early reliability evaluation with a cross-layer approach across different computing disciplines, across computing system layers and across computing market segments. The fundamental objective of the project is to investigate in depth a methodology to assess system reliability early in the design cycle of the future systems of the emerging computing continuum. This paper presents a general overview of the CLERECO project focusing on the main tools and models that are being developed that could be of interest for the research community and engineering practice

    Multicore soft error rate stabilization using adaptive dual modular redundancy

    No full text

    Cross layer reliability estimation for digital systems

    Get PDF
    Forthcoming manufacturing technologies hold the promise to increase multifuctional computing systems performance and functionality thanks to a remarkable growth of the device integration density. Despite the benefits introduced by this technology improvements, reliability is becoming a key challenge for the semiconductor industry. With transistor size reaching the atomic dimensions, vulnerability to unavoidable fluctuations in the manufacturing process and environmental stress rise dramatically. Failing to meet a reliability requirement may add excessive re-design cost to recover and may have severe consequences on the success of a product. %Worst-case design with large margins to guarantee reliable operation has been employed for long time. However, it is reaching a limit that makes it economically unsustainable due to its performance, area, and power cost. One of the open challenges for future technologies is building ``dependable'' systems on top of unreliable components, which will degrade and even fail during normal lifetime of the chip. Conventional design techniques are highly inefficient. They expend significant amount of energy to tolerate the device unpredictability by adding safety margins to a circuit's operating voltage, clock frequency or charge stored per bit. Unfortunately, the additional cost introduced to compensate unreliability are rapidly becoming unacceptable in today's environment where power consumption is often the limiting factor for integrated circuit performance, and energy efficiency is a top concern. Attention should be payed to tailor techniques to improve the reliability of a system on the basis of its requirements, ending up with cost-effective solutions favoring the success of the product on the market. Cross-layer reliability is one of the most promising approaches to achieve this goal. Cross-layer reliability techniques take into account the interactions between the layers composing a complex system (i.e., technology, hardware and software layers) to implement efficient cross-layer fault mitigation mechanisms. Fault tolerance mechanism are carefully implemented at different layers starting from the technology up to the software layer to carefully optimize the system by exploiting the inner capability of each layer to mask lower level faults. For this purpose, cross-layer reliability design techniques need to be complemented with cross-layer reliability evaluation tools, able to precisely assess the reliability level of a selected design early in the design cycle. Accurate and early reliability estimates would enable the exploration of the system design space and the optimization of multiple constraints such as performance, power consumption, cost and reliability. This Ph.D. thesis is devoted to the development of new methodologies and tools to evaluate and optimize the reliability of complex digital systems during the early design stages. More specifically, techniques addressing hardware accelerators (i.e., FPGAs and GPUs), microprocessors and full systems are discussed. All developed methodologies are presented in conjunction with their application to real-world use cases belonging to different computational domains

    A framework for SLA-centric service-based Utility Computing

    Get PDF
    Nicht angegebenService oriented Utility Computing paves the way towards realization of service markets, which promise metered services through negotiable Service Level Agreements (SLA). A market does not necessarily imply a simple buyer-seller relationship, rather it is the culmination point of a complex chain of stake-holders with a hierarchical integration of value along each link in the chain. In service value chains, services corresponding to different partners are aggregated in a producer-consumer manner resulting in hierarchical structures of added value. SLAs are contracts between service providers and service consumers, which ensure the expected Quality of Service (QoS) to different stakeholders at various levels in this hierarchy. \emph{This thesis addresses the challenge of realizing SLA-centric infrastructure to enable service markets for Utility Computing.} Service Level Agreements play a pivotal role throughout the life cycle of service aggregation. The activities of service selection and service negotiation followed by the hierarchical aggregation and validation of services in service value chain, require SLA as an enabling technology. \emph{This research aims at a SLA-centric framework where the requirement-driven selection of services, flexible SLA negotiation, hierarchical SLA aggregation and validation, and related issues such as privacy, trust and security have been formalized and the prototypes of the service selection model and the validation model have been implemented. } The formal model for User-driven service selection utilizes Branch and Bound and Heuristic algorithms for its implementation. The formal model is then extended for SLA negotiation of configurable services of varying granularity in order to tweak the interests of the service consumers and service providers. %and then formalizing the requirements of an enabling infrastructure for aggregation and validation of SLAs existing at multiple levels and spanning % along the corresponding service value chains. The possibility of service aggregation opens new business opportunities in the evolving landscape of IT-based Service Economy. A SLA as a unit of business relationships helps establish innovative topologies for business networks. One example is the composition of computational services to construct services of bigger granularity thus giving room to business models based on service aggregation, Composite Service Provision and Reselling. This research introduces and formalizes the notions of SLA Choreography and hierarchical SLA aggregation in connection with the underlying service choreography to realize SLA-centric service value chains and business networks. The SLA Choreography and aggregation poses new challenges regarding its description, management, maintenance, validation, trust, privacy and security. The aggregation and validation models for SLA Choreography introduce concepts such as: SLA Views to protect the privacy of stakeholders; a hybrid trust model to foster business among unknown partners; and a PKI security mechanism coupled with rule based validation system to enable distributed queries across heterogeneous boundaries. A distributed rule based hierarchical SLA validation system is designed to demonstrate the practical significance of these notions

    Reliable Software for Unreliable Hardware - A Cross-Layer Approach

    Get PDF
    A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers
    corecore