47,700 research outputs found

    An optimal fixed-priority assignment algorithm for supporting fault-tolerant hard real-time systems

    Get PDF
    The main contribution of this paper is twofold. First, we present an appropriate schedulability analysis, based on response time analysis, for supporting fault-tolerant hard real-time systems. We consider systems that make use of error-recovery techniques to carry out fault tolerance. Second, we propose a new priority assignment algorithm which can be used, together with the schedulability analysis, to improve system fault resilience. These achievements come from the observation that traditional priority assignment policies may no longer be appropriate when faults are being considered. The proposed schedulability analysis takes into account the fact that the recoveries of tasks may be executed at higher priority levels. This characteristic is very important since, after an error, a task certainly has a shorter period of time to meet its deadline. The proposed priority assignment algorithm, which uses some properties of the analysis, is very efficient. We show that the method used to find out an appropriate priority assignment reduces the search space from O(n!) to O(n/sup 2/), where n is the number of task recovery procedures. Also, we show that the priority assignment algorithm is optimal in the sense that the fault resilience of task sets is maximized as for the proposed analysis. The effectiveness of the proposed approach is evaluated by simulation

    Timing analysis for embedded systems using non-preemptive EDF scheduling under bounded error arrivals

    Get PDF
    Embedded systems consist of one or more processing units which are completely encapsulated by the devices under their control, and they often have stringent timing constraints associated with their functional specification. Previous research has considered the performance of different types of task scheduling algorithm and developed associated timing analysis techniques for such systems. Although preemptive scheduling techniques have traditionally been favored, rapid increases in processor speeds combined with improved insights into the behavior of non-preemptive scheduling techniques have seen an increased interest in their use for real-time applications such as multimedia, automation and control. However when non-preemptive scheduling techniques are employed there is a potential lack of error confinement should any timing errors occur in individual software tasks. In this paper, the focus is upon adding fault tolerance in systems using non-preemptive deadline-driven scheduling. Schedulability conditions are derived for fault-tolerant periodic and sporadic task sets experiencing bounded error arrivals under non-preemptive deadline scheduling. A timing analysis algorithm is presented based upon these conditions and its run-time properties are studied. Computational experiments show it to be highly efficient in terms of run-time complexity and competitive ratio when compared to previous approaches

    Leveraging Weakly-hard Constraints for Improving System Fault Tolerance with Functional and Timing Guarantees

    Full text link
    Many safety-critical real-time systems operate under harsh environment and are subject to soft errors caused by transient or intermittent faults. It is critical and yet often very challenging to apply fault tolerance techniques in these systems, due to their resource limitations and stringent constraints on timing and functionality. In this work, we leverage the concept of weakly-hard constraints, which allows task deadline misses in a bounded manner, to improve system's capability to accommodate fault tolerance techniques while ensuring timing and functional correctness. In particular, we 1) quantitatively measure control cost under different deadline hit/miss scenarios and identify weak-hard constraints that guarantee control stability, 2) employ typical worst-case analysis (TWCA) to bound the number of deadline misses and approximate system control cost, 3) develop an event-based simulation method to check the task execution pattern and evaluate system control cost for any given solution and 4) develop a meta-heuristic algorithm that consists of heuristic methods and a simulated annealing procedure to explore the design space. Our experiments on an industrial case study and a set of synthetic examples demonstrate the effectiveness of our approach.Comment: ICCAD 202

    Computer Simulation of PMSM Motor with Five Phase Inverter Control using Signal Processing Techniques

    Get PDF
    The signal processing techniques and computer simulation play an important role in the fault diagnosis and tolerance of all types of machines in the first step of design. Permanent magnet synchronous motor (PMSM) and five phase inverter with sine wave pulse width modulation (SPWM) strategy is developed. The PMSM speed is controlled by vector control. In this work, a fault tolerant control (FTC) system in the PMSM using wavelet switching is introduced. The feature extraction property of wavelet analysis used the error as obtained by the wavelet de-noised signal as input to the mechanism unit to decide the healthy system. The diagnosis algorithm, which depends on both wavelet and vector control to generate PWM as current based manage any parameter variation. An open-end phase PMSM has a larger range of speed regulation than normal PMSM. Simulation results confirm the validity and effectiveness of the switching strategy

    Improving Performance of Iterative Methods by Lossy Checkponting

    Get PDF
    Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks in parallel, they have to checkpoint the dynamic variables periodically in case of unavoidable fail-stop errors, requiring fast I/O systems and large storage space. To this end, significantly reducing the checkpointing overhead is critical to improving the overall performance of iterative methods. Our contribution is fourfold. (1) We propose a novel lossy checkpointing scheme that can significantly improve the checkpointing performance of iterative methods by leveraging lossy compressors. (2) We formulate a lossy checkpointing performance model and derive theoretically an upper bound for the extra number of iterations caused by the distortion of data in lossy checkpoints, in order to guarantee the performance improvement under the lossy checkpointing scheme. (3) We analyze the impact of lossy checkpointing (i.e., extra number of iterations caused by lossy checkpointing files) for multiple types of iterative methods. (4)We evaluate the lossy checkpointing scheme with optimal checkpointing intervals on a high-performance computing environment with 2,048 cores, using a well-known scientific computation package PETSc and a state-of-the-art checkpoint/restart toolkit. Experiments show that our optimized lossy checkpointing scheme can significantly reduce the fault tolerance overhead for iterative methods by 23%~70% compared with traditional checkpointing and 20%~58% compared with lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1

    Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs

    Full text link
    Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1) generation of fault-span, the set of states reachable in the presence of faults, and (2) resolving deadlock states, from where the program has no outgoing transitions. Of these, the former closely resembles with model checking and, hence, techniques for efficient verification are directly applicable to it. Hence, we focus on expediting the latter with the use of multi-core technology. We present two approaches for parallelization by considering different design choices. The first approach is based on the computation of equivalence classes of program transitions (called group computation) that are needed due to the issue of distribution (i.e., inability of processes to atomically read and write all program variables). We show that in most cases the speedup of this approach is close to the ideal speedup and in some cases it is superlinear. The second approach uses traditional technique of partitioning deadlock states among multiple threads. However, our experiments show that the speedup for this approach is small. Consequently, our analysis demonstrates that a simple approach of parallelizing the group computation is likely to be the effective method for using multi-core computing in the context of deadlock resolution

    Algorithmic Based Fault Tolerance Applied to High Performance Computing

    Full text link
    We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel distributed computation. We obtain a strongly scalable mechanism for fault tolerance. We can also detect and correct errors (bit-flip) on the fly of a computation. To assess the viability of our approach, we have developed a fault tolerant matrix-matrix multiplication subroutine and we propose some models to predict its running time. Our parallel fault-tolerant matrix-matrix multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov) and returns a correct result while one process failure has happened. This represents 65% of the machine peak efficiency and less than 12% overhead with respect to the fastest failure-free implementation. We predict (and have observed) that, as we increase the processor count, the overhead of the fault tolerance drops significantly

    Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

    Get PDF
    Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed
    • …
    corecore