5 research outputs found
Group Communication in Amoeba and its Applications
Unlike many other operating systems, Amoeba is a distributed operating system that provides group communication (i.e., one-to-many communication). We wil
Fault tolerant software technology for distributed computing system
Issued as Monthly reports [nos. 1-23], Interim technical report, Technical guide books [nos. 1-2], and Final report, Project no. G-36-64
Determining the Last Process to Fail
A total failure occurs whenever all processes cooperatively executing a distributed task fail before the task's completion. A frequent prerequisite for recovery from a total failure is the identification of the last group (LAST) of processes concurrently failing. Herein, we derive necessary and sufficient conditions for computing LAST from the local failure data of recovered processes. These conditions are easily translated into decision procedures for LAST membership using either complete or incomplete failure data. The choice of failure data itself is dictated by two requirements: (1) it can be cheaply maintained, and (2) maximum fault-tolerance is afforded in the sense that the expected number of recoveries required for identifying LAST is minimized