research

Fault-tolerant parallel scheduling of arbitrary length jobs on a shared channel

Abstract

We study the problem of scheduling jobs on fault-prone machines communicating via a shared channel, also known as multiple-access channel. We have nn arbitrary length jobs to be scheduled on mm identical machines, ff of which are prone to crashes by an adversary. A machine can inform other machines when a job is completed via the channel without collision detection. Performance is measured by the total number of available machine steps during the whole execution. Our goal is to study the impact of preemption (i.e., interrupting the execution of a job and resuming later in the same or different machine) and failures on the work performance of job processing. The novelty is the ability to identify the features that determine the complexity (difficulty) of the problem. We show that the problem becomes difficult when preemption is not allowed, by showing corresponding lower and upper bounds, the latter with algorithms reaching them. We also prove that randomization helps even more, but only against a non-adaptive adversary; in the presence of more severe adaptive adversary, randomization does not help in any setting. Our work has extended from previous work that focused on settings including: scheduling on multiple-access channel without machine failures, complete information about failures, or incomplete information about failures (like in this work) but with unit length jobs and, hence, without considering preemption

    Similar works