1 research outputs found
Constructing fail-controlled nodes for distributed systems: a software approach
PhD ThesisDesigning and implementing distributed systems which continue to provide specified services
in the presence of processing site and communication failures is a difficult task. To facilitate
their development, distributed systems have been built assuming that their underlying hardware
components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional
hardware cannot provide the assumed failure mode, there is a need to build processing sites
or nodes, and communication infra-structure that present the fail-controlled behaviour assumed.
Coupling a number of redundant processors within a replicated node is a well known way
of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at
each processor, and by employing suitable validation techniques to the outputs generated by processors
(e.g. majority voting, comparison), outputs from faulty processors can be prevented from
appearing at the application level.
One way of constructing replicated nodes is by introducing hardwired mechanisms to
couple replicated processors with specialised validation hardware circuits. Processors are tightly
synchronised at the clock cycle level, and have their outputs validated by a reliable validation
hardware. Another approach is to use software mechanisms to perform synchronisation of processors
and validation of the outputs. The main advantage of hardware based nodes is the minimum
performance overhead incurred. However, the introduction of special circuits may increase
the complexity of the design tremendously. Further, every new microprocessor architecture requires
considerable redesign overhead. Software based nodes do not present these problems, on
the other hand, they introduce much bigger performance overheads to the system.
In this thesis we investigate alternative ways of constructing efficient fail-controlled, software
based replicated nodes. In particular, we present much more efficient order protocols, which
are necessary for the implementation of these nodes. Our protocols, unlike others published to
date, do not require processors' physical clocks to be explicitly synchronised. The main contribution
of this thesis is the precise definition of the semantics of a software based Jail-silent node,
along with its efficient design, implementation and performance evaluation.The Brazilian National Research Council (CNPq/Brasil)