134,511 research outputs found
An Analytical Solution for Probabilistic Guarantees of Reservation Based Soft Real-Time Systems
We show a methodology for the computation of the probability of deadline miss
for a periodic real-time task scheduled by a resource reservation algorithm. We
propose a modelling technique for the system that reduces the computation of
such a probability to that of the steady state probability of an infinite state
Discrete Time Markov Chain with a periodic structure. This structure is
exploited to develop an efficient numeric solution where different
accuracy/computation time trade-offs can be obtained by operating on the
granularity of the model. More importantly we offer a closed form conservative
bound for the probability of a deadline miss. Our experiments reveal that the
bound remains reasonably close to the experimental probability in one real-time
application of practical interest. When this bound is used for the optimisation
of the overall Quality of Service for a set of tasks sharing the CPU, it
produces a good sub-optimal solution in a small amount of time.Comment: IEEE Transactions on Parallel and Distributed Systems, Volume:27,
Issue: 3, March 201
A Parallel Divide-and-Conquer based Evolutionary Algorithm for Large-scale Optimization
Large-scale optimization problems that involve thousands of decision
variables have extensively arisen from various industrial areas. As a powerful
optimization tool for many real-world applications, evolutionary algorithms
(EAs) fail to solve the emerging large-scale problems both effectively and
efficiently. In this paper, we propose a novel Divide-and-Conquer (DC) based EA
that can not only produce high-quality solution by solving sub-problems
separately, but also highly utilizes the power of parallel computing by solving
the sub-problems simultaneously. Existing DC-based EAs that were deemed to
enjoy the same advantages of the proposed algorithm, are shown to be
practically incompatible with the parallel computing scheme, unless some
trade-offs are made by compromising the solution quality.Comment: 12 pages, 0 figure
Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches
Background: Large-scale biological jobs on high-performance computing systems
require manual intervention if one or more computing cores on which they
execute fail. This places not only a cost on the maintenance of the job, but
also a cost on the time taken for reinstating the job and the risk of losing
data and execution accomplished by the job before it failed. Approaches which
can proactively detect computing core failures and take action to relocate the
computing core's job onto reliable cores can make a significant step towards
automating fault tolerance.
Method: This paper describes an experimental investigation into the use of
multi-agent approaches for fault tolerance. Two approaches are studied, the
first at the job level and the second at the core level. The approaches are
investigated for single core failure scenarios that can occur in the execution
of parallel reduction algorithms on computer clusters. A third approach is
proposed that incorporates multi-agent technology both at the job and core
level. Experiments are pursued in the context of genome searching, a popular
computational biology application.
Result: The key conclusion is that the approaches proposed are feasible for
automating fault tolerance in high-performance computing systems with minimal
human intervention. In a typical experiment in which the fault tolerance is
studied, centralised and decentralised checkpointing approaches on an average
add 90% to the actual time for executing the job. On the other hand, in the
same experiment the multi-agent approaches add only 10% to the overall
execution time.Comment: Computers in Biology and Medicin
- …