The computational power of self-stabilizing distributed systems is examined. Assuming availability of any number of processors, each with (small) constant size memory we show that any computable problem can be realized in a self-stabilizing fashion.
Introduction
Our motivation to explore the power of interconnected processors with constant size memory was preliminary triggered by the following questions: What is the relation between the computational power of a single powerful computer and a distributed system of limited power and memory processors that are subject to transient faults? The approach is di erent from the one taken by the parallel algorithm community . The concern in this work is the fault tolerance of the algorithm rather than the time it takes to execute its task. We view a distributed system as a stand alone system (as opposed to a single site parallel machine that can be locally controlled) that runs on-going tasks and is able to overcome faults.
In particular we are interested in self-stabilizing systems. A self-stabilizing system is a system that can be started in any possible global state. A transient fault is a fault that cause the state of a processor to change arbitrarily. Self-stabilizing systems can tolerate transient faults. When the intermediate period between two successive transient faults is long enough the system stabilizes. Following its stabilization the system demonstrates its desired prede ned behavior.
In this paper we consider a distributed system of processors with constant amount of memory. In order to understand the inherent behavior of the system we examine the extreme case where each processor is equipped with only few bits of memory. The reference powerful computer is modeled by a Turing machine. Theoretically there is no upper bound on the amount of memory needed for storing the program that a computer needs to execute. This fact is also true in terms of Turing machines { where the program corresponds to the transition table. In order to eliminate the table size factor we consider only a speci c deterministic universal Turing machine, (denoted in the sequel by TM), .
Input and output are given in a distributed fashion. Roughly speaking each processor receives part of the input, and outputs part of the output (e.g. FLP-85]). Since the processors have constant memory size, the input consists of no more than a constant number of bits. This form of input is the common case in distributed systems where either many users at di erent sites specify the inputs (e.g. resource allocation, consensus) or the input contains information about the status of the local neighborhood (e.g. topology update, distributed snapshot). The output of the processors must (eventually) be correct with respect to the inputs. Note that the input can be changed during the regular execution of the algorithm. In this paper we only focus on long enough periods of time in which the input is xed and require that the output corresponds to the input some time after each such period begins.
Our distributed system is connected in a chain topology. The chain is only an abstraction of a prede ned marked chain over any general graph. In particular, a system with a prede ned ring (as is the case for token ring protocols) and a prede ned leader ts. Each processor in the chain could be a communication port processor with limited resources of memory and computation. The number of processors in the chain is not a priori restricted. Obviously, any existing hardware is nite and the number of processors in the system is also nite. However, in order to have a base for comparison with the in nite tape of a Turing machine we do not a priori restrict the number of processors. For any given n we construct a self-stabilizing distributed system with n constant memory processors.
Certainly, a TM can simulate the execution of any distributed system using the same amount of memory (up to constant factor). Interestingly enough, we show that processors with (small) constant amount of memory can tolerate transient faults and obtain the same result as a fault free execution of a Turing machine. In particular we show that a distributed system of interconnected constant size memory processors can simulate the computation of a TM in the presence of transient faults. The total amount of memory required by the distributed system during the computation with the input word w is equal to the memory used by the TM with the input word w (up to constant factor).
The study of self-stabilizing algorithms started with the fundamental paper of Dijkstra, Dij-74], where three self-stabilizing algorithms for the mutual exclusion problem were presented. Dijkstra's algorithms work for a system whose communication graph is a ring. All the processors in the ring are identical but one which is a special processor. Dijkstra proved the necessity of the special processor for breaking symmetry.
Recently, an extensive e ort is directed towards nding time and memory e cient selfstabilizing algorithms (Cf. . The use of distinct identi ers yields a lower bound of (log n) bits for the size of memory per processor. Thus, those solutions do not apply to systems with constant memory size processors. Other recent works use randomization in order to break symmetry (Cf. IJ-90a], DIM-91b], MO+92]). However, there is no randomized self stabilizing mutual exclusion or leader election algorithm that use only constant amount of memory per processor 1 . In this paper we neither assume distinct identi ers nor use randomization. We restrict the system topology to be a directed chain with a leader processor at one endpoint and a tail processor in the other.
The reminder of the paper is organized as follows. In the next section we formalize the assumptions and requirements. Section 3 contains the description of our algorithm. In Section 4 we bring a method to accelerate the algorithm when a bound on the time complexity is known. Concluding remarks are in section 5.
Distributed System
We consider distributed systems that consist of processors P 1 ; P 2 ; ; P n , that are connected in a chain. The processors are anonymous, the subscripts 1 to n are used only for convenience. No processor knows n the number of processors. Processors have sense of direction, i.e. for 1 < j < n, P j?1 is the left neighbor of P j and P j+1 is the right neighbor of P j . Using the same conventions, P 2 is the right neighbor of P 1 and P n?1 is the left neighbor of P n . P 1 is the leader processor, P n is the tail processor and the rest of the processors are intermediate processors.
1
In MO+92] some synchrony is assumed in order to avoid deadlocks.
Similarly to , processors communicate by the use of shared communication registers 2 . Two neighboring processors P i and P j , communicate by two shared registers r ij and r ji . P i (P j ) writes in r ij (r ji ) and reads from r ji (r ij ). The shared registers of a processor are the registers in which the processor writes. In addition to access to its neighbors communication registers, each processor P i can repeatedly read one symbol of input from I i its input register and repeatedly write one symbol of output to its output register O i . The content of I i and O i is either 0, 1 or ?. We view the concatenation of the input symbols as a xed word in f0; 1g l ? n?l where n l.
Processors are modeled by nite state machines. We denote the set of states of P i by S i . The number of states of each nite state machine is less than k, where k is a constant.
A state s 2 S i of a processor P i fully describes its internal state and the value written in its registers including the output symbol in O i , and the input symbol in I i . A con guration, c 2 S 1 S 2 S n , is a vector of states of all processors. Processors execute atomic steps. An atomic step consist of some local computation followed by either a read from a communication register and the input symbol or a write in a communication register and the output symbol. Processor activity is managed by a scheduler. In any given con guration the scheduler activates a single processor which executes a single atomic step. To ensure correctness of the algorithms, we regard the scheduler as an adversary. An execution of the system is a nite or in nite sequence of con gurations E = (c 1 ; c 2 ; ) such that for i = 1; 2; , c i+1 is reached from c i by a single atomic step of some processor. A fair execution is an in nite execution in which every processor executes atomic steps in nitely often.
In a distributed system each processor may execute atomic steps at any constant or nonconstant rate. Various processors might be slow in various parts of the execution. The following de nition of round complexity captures the rate of action of the slowest processor in any segment of the execution. Given an execution E , the rst round of E is nished immediately after each processor has executed one cycle; the second round is nished after each processor has executed one cycle following the termination of the rst round, and so on and so forth. For any given execution, E , the round complexity (which is sometimes called the execution time) of E is the number of rounds in E .
The requirements for self-stabilizing algorithms state the conditions under which the system has to stabilize when started in an arbitrary con guration and speci es the required behavior of the system following the stabilization period. Next we de ne the self-stabilization requirements for our distributed algorithm A. Let w be a word in f0; 1g l . An algorithm A is self-stabilizing if for any nite n, when A is executed by a system of n processors and is started in any possible con guration, c, with input word w? n?l (n l) then: (1) Any fair execution that starts with c has a su x in which the output of every processor P i is constant and, (2) This constant output is 1 (0, respectively), if the TM accepts (rejects) w using no more than n working tape cells, otherwise the output is ?.
The Reduction
We use a self-stabilizing mutual exclusion algorithm for a distributed system in a directed chain topology. In fact this is almost the setting for one of the algorithms in Dij-74]. However as discussed in DIM-90] the algorithms in Dij-74] make a strong assumption on the atomicity and the scheduling of the operations. In order to relax this assumption we adopt either the self stabilizing mutual exclusion algorithm for tree topologies of DIM-90] or the self-stabilizing repeatedly coloring tree algorithm presented in DIM-91b] 3 . We apply one of these algorithms to the special case of a directed chain with a leader at one of the end points. In each of these algorithms each processor uses a constant number of states.
The self-stabilizing mutual exclusion algorithm guarantees that starting with any possible con guration, after a nite number of rounds every con guration contains exactly one processor which is executing the critical section. For simplicity we consider a processor that executes the critical section as holding a token. The algorithms in DIM-90] and DIM-91b] ensure that in every fair execution following the stabilization period the single token repeatedly \travels" from the leader to the tail and back. We use the terms send token and receive token to indicate transfer of the critical section from one processor to another. Note that before a processor P transfers the privilege to execute the critical section, P can write in its shared communication register a \content" for the token. Thus, we view the token as an entity with a value that is transferred from one processor to another.
The formal description of our algorithm is presented in Figures 1 to 4 . Informally, we use the (eventual) behavior of the token to ensure that the chain of processors will repeatedly write only the correct output. After the stabilization period the chain implements a virtual TM. Each processor P i maintains a symbol of the virtual TM working tape, WrkSym i . P i also maintains a ag HdMrk i to indicate whether P i has the head of the virtual TM. When a processor P i with HdMrk=T receives a token from the leader direction, P i uses the values of the current state of the TM, Tkn.TMSta, and working tape symbol, WrkSym i , to deduce new values for Tkn.TMSta and WrkSym i , as well as to nd the direction of the TM head movement HdMov. In the code we denote these operations by Tkn.TMSta, WrkSym, HdMov:=TM(Tkn.TMSta,WrkSym) 4 . In case the direction is towards the tail, P i sets Tkn.HdMrk:=T and the TM head moves to the right neighbor. Otherwise, when the direction of the head movement is towards the leader, the transition of the TM head is delayed until the token arrives from the direction of the tail. The tree coloring is faster in terms of stabilization time O(n) rounds (while the algorithm in DIM-90] is O(dn) rounds) and is in fact a mutual exclusion algorithm when applied to a chain of processors.
Our algorithm uses a distributed binary counter to ensure correct output. This distributed binary counter counts the number of TM con gurations in the execution. Each processor maintains two bits of the distributed counter. The bits are ordered { the leader, P 1 , maintains the most signi cant bits and the tail, P n , maintains the least signi cant bits. Roughly speaking, each time the token arrives to the tail the tail starts incrementing the distributed counter by one. The tail, P n , computes the new value for CntBits n and the carry. Then P n writes the carry value in Tkn.Cr to its neighbor P n?1 . Whenever the leader, P 1 , detects a counter over ow the leader resets the chain to an initial state.
The leader initiates a new computation whenever it detects a counter over ow or receives the token with Tkn.TMSta that indicates the acceptance or rejection of the input word. The initialization starts with the assignments: WrkSym 1 :=I 1 , HdMrk 1 :=T, CntBits 1 :=00. Then if the last computation results with counter over ow the leader sets Tkn.TMSta:=? and by that notify every processor P i to write ? in O i . Otherwise, Tkn.TMSta is assigned to the initial state of the TM and (if the initial state is not an accepting or rejecting state then) no processor writes to its output. Note that this last assignment appears in the code as Tkn.TMSta:=TM(Initial). After assigning Tkn.TMSta with the right value, the leader sends the token to its right neighbor. When an intermediate processor P i receives a token with Tkn.Rst=T, P i assigns WrkSym i :=I i , HdMrk i :=F, CntBits i :=00 and sends the token to the right. The tail processor, P n , executes the same assignment as the intermediate processor does, in addition P n assigns Tkn.Rst:=F to indicate the completion of the reset.
Whenever the leader or the tail receive a token with Tkn.TMSta that indicates acceptance (rejection) of w then they write 1 (0, respectively) to their output register. When an intermediate processor, P , receives a token from its right neighbor with Tkn.TMSta that indicates acceptance (rejection) of w then P writes 1 (0, respectively) to its output register. The leader assigns Tkn.TMSta:=? and O 1 := ? whenever either (1) a counter over ow occurs or (2) the reading head of TM fails from the right of the working tape or (3) Tkn.Rst=T. The tail assigns O n := ? whenever it receives Tkn.TMSta=?; the tail assigns Tkn.Rst:=T whenever the head of the TM attempts to move to the right of the tail. An intermediate processor, P , writes ? to the output register whenever P receives a token from its right neighbor with Tkn.TMSta=?. The function TM(Tkn.TMSta) returns, 1 (0) if Tkn.TMSta is an accepting (rejecting, respectively) state, and ? is Tkn.TMSta=?.
Correctness Proof
In this extended abstract we only state the lemmas and give the main ideas for their proofs. The correctness hinges on the existence of a self-stabilizing mutual exclusion algorithm. In particular, the coloring algorithm of DIM-91b] guarantees that in any fair execution, after O(n) rounds, a safe con guration c is reached such that, in any con guration that appears after c, there exist at most one processor that executes the critical section. Moreover, following c the processors repeatedly execute the critical section in a xed order, from the leader to the tail and back. Note that before the safe con guration c is reached there can be many tokens or none. In this period of time our algorithm does not operate correctly. Thus, when the mutualexclusion algorithm stabilizes, the other part of the algorithm (that assumes the existence of a token that travels nicely from the leader to the tail and back) is in an arbitrary state. For example in such an arbitrary state more than one processor can have HdMrk=T. We prove that this part of the algorithm stabilizes too.
Lemma 3.1 In every fair execution that starts with a safe con guration c of the mutual exclusion algorithm, the leader assigns Tkn.Rst:=T at least once in every 4n2 2n rounds.
The proof of the lemma uses the fact that the binary counter is incremented by one in at most every 4n rounds.
It is easy to see that following c whenever the leader sends a token with Tkn.Rst:=T the token initializes the working tape bits, the counter bits, the place of the Turing machines' head, and the Turing machine state.
Lemma 3.2 In any fair execution that starts with a safe con guration c of the mutual exclusion algorithm, after the leader assigns Tkn.Rst:=T and sends the token, the token travels to the tail and every processor that receives the token initiates its variables.
Let C be the number of states of a TM. Note that by HU-79] pp. 173 and Sh56] C 56. The next lemma is proved by a simple counting argument.
Lemma 3.3 The number of di erent TM con gurations with a tape of size n and f0,1g alphabet is at most C n 2 2 n .
De ne a reset initialization con guration to be a con guration that immediately follows an atomic step of the leader in which it assigns Tkn.Rst:=T. For every reset initialization con guration, c init , we de ne a reset termination con guration, c term to be the rst con guration after c init that follows an atomic step of the leader in which the leader receives the token.
Lemma 3.4 In the second reset termination con guration after c, all the output registers are identical. Moreover the outputs are 1 (0, respectively) i TM accepts (rejects, respectively) w with a working tape of size n. Otherwise, the output is ?.
The proof of the lemma is implied by the correct behavior of the system following the rst reset termination. The next theorem uses the fact that the behavior of the simulated Turing machine is identical starting with any reset termination con guration.
Theorem 3.5 In every fair execution after O(n2 2n ) rounds, in every con guration the output of each processor is accept (reject) if the TM accepts (rejects) w with working tape of size n.
Otherwise, the output is ?.
Proof: By Lemma 3.1 the rst reset that follows the stabilization time of the mutualexclusion algorithm occurs during the rst 4n2 2n rounds. After this reset the algorithm starts the TM computation correctly. By Lemma 3.4 during additional 4n2 2n rounds each processor either receives a token with TMSta that indicates acceptance or rejection or with Tkn.TMSta= ?. Since any further computation that begins with a later reset is identical the output is not changed.
Accelerating the Algorithm
The algorithm presented in the previous section requires O(n2 2n ) rounds to stabilize. In this section we accelerate the algorithm by the use of an upper bound on the execution time of the TM. If there is an upper bound t(w) on the execution time of the TM for an input word w then we can accelerate our algorithm to stabilize within O(nt(w)) time. The acceleration uses the assumption that t(w) is part of the input. The input of each processor consists of three symbols, one is a symbol from the input word w as before, the other two are related to t(w). The two most signi cant bits of the binary representation of t(w) are inputs to the leader, P 1 , the next two are input to P 2 and so on and so forth. We assume that ? indicates the end of the binary representation of t(w) (in case jt(w)j < 2n).
Denote the two symbols of the processor P i that are related to t(w) by t(w) i . When the token travels from the leader to the tail, each processor P i checks whether the value of the two symbols that are part of the representation of the binary counter equal to t(w) i . Every time the leader receives the token the leader assigns Tkn.Ovr:=Cmpr. While the token travels towards the tail each processor that receives the token with Tkn.Ovr=Cmpr checks whether the binary counter symbols equal to t(w) i . If P i nds a di erence then P assigns Tkn.Ovr:=False. A processor P j , for which one of the symbols of t(w) j is ?, assigns Tkn.Over:=True when P j receives a token with Tkn.Ovr=Cmpr and the two symbols of the binary counter of P j are equal to t(w) j . The tail assigns Tkn.Ovr:=True upon receiving Tkn.Ovr=Cmpr and nding that t(w) n equals to the counter symbols. Thus, the leader receives indication on whether the binary counter is equal to t(w) and can initiate a reset on time.
Concluding Remarks
In this paper we investigated the computational power of self-stabilizing systems with constant memory size processors. Interestingly, interconnected processors with (small) constant amount of memory can tolerate transient faults and obtain the same result as a fault free execution of a Turing machine. This implies that when there is an embedded ring with a leader in a system with constant memory size processors, the system copes with transient faults and still has the computational power of a Turing machine with the same total amount of memory (up to a constant factor).
if (Tkn.Cr=1 and CntBits=11) 
