620 research outputs found

    Scheduling and Voltage Scaling for Energy/Reliability Trade-offs in Fault-Tolerant Time-Triggered Embedded Systems

    Get PDF
    In this paper we present an approach to the scheduling and voltage scaling of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Processes and messages are statically scheduled, and we use process re-execution for recovering from multiple transient faults. Addressing simultaneously energy and reliability is especially challenging because lowering the voltage to reduce the energy consumption has been shown to increase the transient fault rates. In addition, time-redundancy based fault-tolerance techniques such as re-execution and dynamic voltage scaling-based low-power techniques are competing for the slack in the schedules. Our approach decides the voltage levels and start times of processes and the transmission times of messages, such that the transient faults are tolerated, the timing constraints of the application are satisfied and the energy is minimized. We present a constraint logic programming-based approach which is able to find reliable and schedulable implementations within limited energy and hardware resources

    Low-energy standby-sparing for hard real-time systems

    No full text
    Time-redundancy techniques are commonly used in real-time systems to achieve fault tolerance without incurring high energy overhead. However, reliability requirements of hard real-time systems that are used in safety-critical applications are so stringent that time-redundancy techniques are sometimes unable to achieve them. Standby sparing as a hardwareredundancy technique can be used to meet high reliability requirements of safety-critical applications. However, conventional standby-sparing techniques are not suitable for lowenergy hard real-time systems as they either impose considerable energy overheads or are not proper for hard timing constraints. In this paper we provide a technique to use standby sparing for hard real-time systems with limited energy budgets. The principal contribution of this work is an online energymanagement technique which is specifically developed for standby-sparing systems that are used in hard real-time applications. This technique operates at runtime and exploits dynamic slacks to reduce the energy consumption while guaranteeing hard deadlines. We compared the low-energy standby-sparing (LESS) system with a low-energy timeredundancy system (from a previous work). The results show that for relaxed time constraints, the LESS system is more reliable and provides about 26% energy saving as compared to the time-redundancy system. For tight deadlines when the timeredundancy system is not sufficiently reliable (for safety-critical application), the LESS system preserves its reliability but with about 49% more energy consumptio

    3E: Energy-Efficient Elastic Scheduling for Independent Tasks in Heterogeneous Computing Systems

    Get PDF
    Reducing energy consumption is a major design constraint for modern heterogeneous computing systems to minimize electricity cost, improve system reliability and protect environment. Conventional energy-efficient scheduling strategies developed on these systems do not sufficiently exploit the system elasticity and adaptability for maximum energy savings, and do not simultaneously take account of user expected finish time. In this paper, we develop a novel scheduling strategy named energy-efficient elastic (3E) scheduling for aperiodic, independent and non-real-time tasks with user expected finish times on DVFS-enabled heterogeneous computing systems. The 3E strategy adjusts processors’ supply voltages and frequencies according to the system workload, and makes trade-offs between energy consumption and user expected finish times. Compared with other energy-efficient strategies, 3E significantly improves the scheduling quality and effectively enhances the system elasticity

    A Survey of Fault-Tolerance Techniques for Embedded Systems from the Perspective of Power, Energy, and Thermal Issues

    Get PDF
    The relentless technology scaling has provided a significant increase in processor performance, but on the other hand, it has led to adverse impacts on system reliability. In particular, technology scaling increases the processor susceptibility to radiation-induced transient faults. Moreover, technology scaling with the discontinuation of Dennard scaling increases the power densities, thereby temperatures, on the chip. High temperature, in turn, accelerates transistor aging mechanisms, which may ultimately lead to permanent faults on the chip. To assure a reliable system operation, despite these potential reliability concerns, fault-tolerance techniques have emerged. Specifically, fault-tolerance techniques employ some kind of redundancies to satisfy specific reliability requirements. However, the integration of fault-tolerance techniques into real-time embedded systems complicates preserving timing constraints. As a remedy, many task mapping/scheduling policies have been proposed to consider the integration of fault-tolerance techniques and enforce both timing and reliability guarantees for real-time embedded systems. More advanced techniques aim additionally at minimizing power and energy while at the same time satisfying timing and reliability constraints. Recently, some scheduling techniques have started to tackle a new challenge, which is the temperature increase induced by employing fault-tolerance techniques. These emerging techniques aim at satisfying temperature constraints besides timing and reliability constraints. This paper provides an in-depth survey of the emerging research efforts that exploit fault-tolerance techniques while considering timing, power/energy, and temperature from the real-time embedded systems’ design perspective. In particular, the task mapping/scheduling policies for fault-tolerance real-time embedded systems are reviewed and classified according to their considered goals and constraints. Moreover, the employed fault-tolerance techniques, application models, and hardware models are considered as additional dimensions of the presented classification. Lastly, this survey gives deep insights into the main achievements and shortcomings of the existing approaches and highlights the most promising ones

    Energy-aware scheduling under reliability and makespan constraints

    Get PDF
    International audienceWe consider a task graph mapped on a set of homogeneous processors. We aim at minimizing the energy consumption while enforcing two constraints: a prescribed bound on the execution time (or makespan), and a reliability threshold. Dynamic voltage and frequency scaling (DVFS) is an approach frequently used to reduce the energy consumption of a schedule, but slowing down the execution of a task to save energy is decreasing the reliability of the execution. In this work, to improve the reliability of a schedule while reducing the energy consumption, we allow for the re-execution of some tasks. We assess the complexity of the tri-criteria scheduling problem (makespan, reliability, energy) of deciding which task to re-execute, and at which speed each execution of a task should be done, with two different speed models: either processors can have arbitrary speeds (continuous model), or a processor can run at a finite number of different speeds and change its speed during a computation (VDD model). We propose several novel tri-criteria scheduling heuristics under the continuous speed model, and we evaluate them through a set of simulations. The two best heuristics turn out to be very efficient and complementary

    EnSuRe: Energy & Accuracy Aware Fault-tolerant Scheduling on Real-time Heterogeneous Systems

    Get PDF
    This paper proposes an energy efficient real-time scheduling strategy called EnSuRe, which (i) executes real-time tasks on low power consuming primary processors to enhance the system accuracy by maintaining the deadline and (ii) provides reliability against a fixed number of transient faults by selectively executing backup tasks on high power consuming backup processor. Simulation results reveal that EnSuRe consumes nearly 25% less energy, compared to existing techniques, while satisfying the fault tolerance requirements. EnSuRe is also able to achieve 75% system accuracy with 50% system utilisation. Further, the obtained simulation outcomes are validated on benchmark tasks via a fault injection framework on Xilinx ZYNQ APSoC heterogeneous dual core platform

    Real-Time Scheduling Algorithm Design on Stochastic Processors

    Get PDF
    Recent studies have shown that significant power savings are possible with the use of in- exact processors, which may contain a small percentage of errors in computation. However, use of such processors in time-sensitive systems is challenging as these processors significantly hamper the system performance. In this thesis, a design framework is developed for real-time applications running on stochastic processors. To identify hardware error pat- terns, two methods are proposed to predict the occurrence of hardware errors. In addition, an algorithm is designed that uses knowledge of the hardware error patterns to judiciously schedule real-time jobs in order to maximize real-time performance. Both analytical and simulation results show that the proposed approach provides significant performance improvements when compared to an existing real-time scheduling algorithm and is efficient enough for online use

    RAKSHA:Reliable and Aggressive frameworK for System design using High-integrity Approaches

    Get PDF
    Advances in the fabrication technology have been a major driving force in the unprecedented increase in computing capabilities over the last several decades. Despite huge reductions in the switching energy of the transistors, two major issues have emerged with decreasing fabrication technology scales. They are: 1) increased impact of process, voltage, and temperature (PVT) variation on transistor performance, and 2) increased susceptibility of transistors to soft errors induced by high energy particles. In presence of PVT variation, as transistor sizes continue to decrease, the design margins used to guarantee correct operation in the presence of worst-case scenarios have been increasing. Systems run at a clock frequency, which is determined by accounting the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable and aggressive clocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. Such design methodology exploits the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case scenarios. Better-than-worst-case design methodology is advocated by several recent research pursuits, which propose to exploit in-built fault tolerance mechanisms to enhance computer system performance. Recent works have also shown that the performance lose due to over provisioning base on worst-case design margins is upward of 20\% in terms operating frequency and upward of 50\% in terms of power efficiency. The threat of soft error induced system failure in computing systems has become more prominent as we adopt ultra-deep submicron process technologies. With respect to soft error susceptibility, decreasing transistor geometries lower the energy threshold needed by high-energy particles to induce errors. As this trend continues, the need for fault tolerance mechanisms to counteract this effect has moved from a nice to have, to be a requirement in current and future systems. In this dissertation, RAKSHA (meaning to protect and save in Sanskrit), we take a multidimensional look at the challenges of system design built with scaled-technologies using high integrity techniques. In RAKSHA, to mitigate soft errors, we propose lightweight high-integrity mechanisms as basic system building blocks which allow system to offer performance levels comparable to a non-fault tolerant system. In addition, we also propose to effectively exploit and use the availability of fault tolerant mechanisms to allow and tolerate data-dependent failures, thus setting systems to operate at typical case circuit delays and enhance system performance. We also propose the use of novel high-integrity cells for increasing system energy efficiency and also potentially increasing system security by combating power-analysis-based side channel attacks. Such an approach allows balancing of performance, power, and security with no further overhead over the resources needed to incorporate fault tolerance. Using our framework, instead of designing circuits to meet worst-case requirements, circuits can be designed to meet typical-case requirements. In RAKSHA, we propose two efficient soft error mitigation schemes, namely Soft Error Mitigation (SEM) and Soft and Timing Error Mitigation (STEM), using the approach of multiple clocking of data for protecting combinational logic blocks from soft errors. Our first technique, SEM, based on distributed and temporal voting of three registers, unloads the soft error detection overhead from the critical path of the systems. SEM is also capable of ignoring false errors and recovers from soft errors using in-situ fast recovery avoiding recomputation. Our second technique, STEM, while tolerating soft errors, adds timing error detection capability to guarantee reliable execution in aggressively clocked designs that enhance system performance by operating beyond worst-case clock frequency. We also present a specialized low overhead clock phase management scheme that ably supports our proposed techniques. Timing annotated gate level simulations, using 45nm libraries, of a pipelined adder-multiplier and DLX processor show that both our techniques achieve near 100% fault coverage. For DLX processor, even under severe fault injection campaigns, SEM achieves an average performance improvement of 26.58% over a conventional triple modular redundancy voter based soft error mitigation scheme, while STEM outperforms SEM by 27.42%. We refer to systems built with SEM and STEM cells as reliable and aggressive systems. Energy consumption minimization in computing systems has attracted a great deal of attention and has also become critical due to battery life considerations and environmental concerns. To address this problem, many task scheduling algorithms are developed using dynamic voltage and frequency scaling (DVFS). Majority of these algorithms involve two passes: schedule generation and slack reclamation. Using this approach, linear combination of frequencies has been proposed to achieve near optimal energy for systems operating with discrete and traditional voltage frequency pairs. In RAKSHA, we propose a new slack reclamation algorithm, aggressive dynamic and voltage scaling (ADVFS), using reliable and aggressive systems. ADVFS exploits the enhanced voltage frequency spectrum offered by reliable and aggressive designs for improving energy efficiency. Formal proofs are provided to show that optimal energy for reliable and aggressive designs is either achieved by using single frequency or by linear combination of frequencies. ADVFS has been evaluated using random task graphs and our results show 18% reduction in energy when compared with continuous DVFS and over more than 33% when compared with scheme using linear combination of traditional voltage frequency pairs. Recent events have indicated that attackers are banking on side-channel attacks, such as differential power analysis (DPA) and correlation power analysis (CPA), to exploit information leaks from physical devices. Random dynamic voltage frequency scaling (RDVFS) has been proposed to prevent such attacks and has very little area, power, and performance overheads. But due to the one-to-one mapping present between voltage and frequency of DVFS voltage-frequency pairs, RDVFS cannot prevent power attacks. In RAKSHA, we propose a novel countermeasure that uses reliable and aggressive designs to break this one-to-one mapping. Our experiments show that our technique significantly reduces the correlation for the actual key and also reduces the risk of power attacks by increasing the probability for incorrect keys to exhibit maximum correlation. Moreover, our scheme also enables systems to operate beyond the worst-case estimates to offer improved power and performance benefits. For the experiments conducted on AES S-box implemented using 45nm CMOS technology, our approach has increased performance by 22% over the worst-case estimates. Also, it has decreased the correlation for the correct key by an order and has increased the probability by almost 3.5X times for wrong keys when compared with the original key to exhibit maximum correlation. Overall, RAKSHA offers a new way to balance the intricate interplay between various design constraints for the systems designed using small scaled-technologies
    • …
    corecore