134,511 research outputs found

    An Analytical Solution for Probabilistic Guarantees of Reservation Based Soft Real-Time Systems

    Full text link
    We show a methodology for the computation of the probability of deadline miss for a periodic real-time task scheduled by a resource reservation algorithm. We propose a modelling technique for the system that reduces the computation of such a probability to that of the steady state probability of an infinite state Discrete Time Markov Chain with a periodic structure. This structure is exploited to develop an efficient numeric solution where different accuracy/computation time trade-offs can be obtained by operating on the granularity of the model. More importantly we offer a closed form conservative bound for the probability of a deadline miss. Our experiments reveal that the bound remains reasonably close to the experimental probability in one real-time application of practical interest. When this bound is used for the optimisation of the overall Quality of Service for a set of tasks sharing the CPU, it produces a good sub-optimal solution in a small amount of time.Comment: IEEE Transactions on Parallel and Distributed Systems, Volume:27, Issue: 3, March 201

    A Parallel Divide-and-Conquer based Evolutionary Algorithm for Large-scale Optimization

    Full text link
    Large-scale optimization problems that involve thousands of decision variables have extensively arisen from various industrial areas. As a powerful optimization tool for many real-world applications, evolutionary algorithms (EAs) fail to solve the emerging large-scale problems both effectively and efficiently. In this paper, we propose a novel Divide-and-Conquer (DC) based EA that can not only produce high-quality solution by solving sub-problems separately, but also highly utilizes the power of parallel computing by solving the sub-problems simultaneously. Existing DC-based EAs that were deemed to enjoy the same advantages of the proposed algorithm, are shown to be practically incompatible with the parallel computing scheme, unless some trade-offs are made by compromising the solution quality.Comment: 12 pages, 0 figure

    Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches

    Get PDF
    Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core's job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time.Comment: Computers in Biology and Medicin
    • …
    corecore