1 research outputs found
A recovery mechanism for errors caused by a late subjob in a system handling SLA-based Grid workflows
Supporting SLAs (Service Level Agreements) for Grid-based
workflows requires providing mechanisms for handling errors (i.e., the
failures of subjobs). In the context of this paper, we propose an error
recovery mechanism which can handle one failed subjob of a workflow. The
error recovery mechanism has a maximum of three phases, depending on the
impact of the error. In each phase, we use a dedicated algorithm to remap
the subjobs of the workflow to the resources. The main contributions of the
paper are the error recovery mechanism for SLA-based workflows and
the mapping algorithm G-map, which is used in the first phase of the recovery
mechanism. The G-map remaps the groups of subjobs, which are directly
affected by an error. The efficiency of the proposed algorithm is validated
through simulation results