A Universal RAM Machine Resistant to Isolated Bursts of Faults by Capuni, Ilir & Dervishaj, Ervin
21
Towards a Universal RAM Machine Resistant to
Isolated Bursts of Faults
Ilir Çapuni Ervin Dervishaj
Computer Engineering Department
Epoka University
{icapuni, edervishaj10}@epoka.edu.al
Abstract—The most natural question of reliable computa-
tion, in every computation model and noise model, is whether
given a certain level of noise, a machine of that model exists
that can perform arbitrarily complex computations under
noise of that level. This question has positive answers for cir-
cuits, cellular automata, and recently for Turing machines [3],
[4].
Here, we raise the question of the existence of a random
access machine that—with some moderate slowdown — can
simulate any other random access machine even if the sim-
ulator is subjected to constant size bursts of faults separated
by a certain minimum number of steps from each other.
We will analyze and spell out the problems and diculties
that need to be addressed in such construction.
Index Terms—Random access machines, faults, reliability.
I. Introduction
The problem of constructing fault-proof machines from
components that can fail was rst considered by von Neu-
mann in [12], who addressed the problem in the Boolean
circuits model. New advances along this path were made
in [9], [10]. The question has been considered in uniform
models of computation as well. A simple rule for two-
dimensional cellular automata that keeps one bit forever even
though each cell can fail with some small probability was
given in [11]. A 3-dimensional reliably computing cellular
automaton using Toom’s rule was constructed in [7]. Alas,
all simple one-dimensional cellular automata appear to be
“ergodic” (forgetting everything about their initial congu-
ration in time independent of the size). The rst, complex,
nonergodic cellular automaton was constructed in [5], and
improved upon in [6]. Surprisingly, even non-ergodic Turing
machines exists [3], [4].
Computing reliably with RAM machines is considered in a
myriad of papers with various assumptions on the noise. A
similar problem but where only the memory is subject to a
limited amount of noise is considered in [8]. The cited work
initiated an entire line of research approaching the problem
from data structures’ and algorithmic perspective. A recent
result in this line is [2].
Our approach diers from the papers above for that faults
can perturb the central unit and for that that the noise comes
in bursts that are spaced out of each other. The reason for
The rst author’s dedication to this work was made possible by a research
grant RD 01-2013 of Zanus ltd, Ulqin Montenegro
these assumptions on noise is that — similarly as in [4],
[6]— we hope to use this construction as a building block
for a hierarchically organized RAM that can withstand faults
that occur independently of each other with some small
probability.
A. The random access machine model
We view the random-access machine as an extension of the
Turing machine in that that the tape consists of cells able to
store an arbitrary integer, and that the head can jump at any
cell of the memory M specied by its address.
Register PC is a special register called the program
counter. Let Γ = {PC,R0, . . . ,Rk } be the set of registers of
the control unit. Register R0 will be called the accumulator.
The program is a sequence of instructions from the Table I.
Denition I.1. A Random Access Machine is a pair (k,Π)
where k > 0 is the number of registers, and Π is the program
(i.e., a nite sequence of instructions). y
The operation of a RAM machine is described below.
1) Initially, the input x is loaded in the rst m cells of the
memory M , where m is the length of x . The registers
R1, . . . ,Rk , PC and the other memory cells M[m], M[m+
1], . . . are initialized to the value 0.
2) At each step of the execution, the random access
machine executes the program line pointed to by the
program counter PC. After executing the instruction,
the value of PC is incremented by 1, unless the in-
struction is jump, jzero, or jpos If the instruction
is jzero or jpos and the condition is not satised,
PC is also incremented by 1.
Symbols are encoded by integers, and we assume that the
blank symbol is encoded by 0.
Example I.2. If Σ = {#,a,b}, then we might assign # = 0,a =
1,b = 2. Then the input abba would be represented in the
rst four cells of the memory as 1,2,2,1. y
A conguration of a Random Access Machine (k,Π) is a
k + 2-tuple
(PC,R0, . . . ,Rk ,M ),
where M ∈ ZZ is the memory conguration.
ISCIM 2013, pp. 21-23 © 2013 Authors
22
TABLE I
Instructions of a RAM machine
Instruction Semantics Description
load n R0 ← n Put the value n into R0
load Rk R0 ← Rk Put the value of Rk into R0
store Rk Rk ← R0 Put the value of R0 into Rk
read Rk R0 ← M[Rk ] Copy the value at memory location Rk into R0
write Rk M[Rk ]← R0 Write the value of R0 at a memory location Rk
add n R0 ← R0 + n Add the value n to R0
add Rk R0 ← R0 + Rk Add the value Rk to R0
mult Rk R0 ← R0 ∗ Rk Multiply the value in R0 with Rk
jump n PC← n Set the program counter to n
jzero n if R0 = 0 then PC← n If R0 is zero, then set the PC to n
jpos n if R0 > 0 then PC← n If R0 is positive, then set the PC to n
The work of the machine can be described as a sequence
of congurations C0,C1,C2, . . . , where Ct is the conguration
at time t .
The program Π tells us how to compute the next cong-
uration from the present one.
Denition I.3 (Fault). We say that a fault occurs at time
t if conguration Ct+1 is obtained from Ct by violating the
transition function specied by the program. y
B. What faults can do
By denition, if a fault occurred at time t , then, at time t+1
a random cell and the content of the registers in the control
unit are arbitrary. This means that unlike the eects of bursts
in a Turing machine — where bursts produce islands of cells
of diameter β close to the head, in RAM model, these islands
are scattered in a memory in a random way. This is a major
diculty in construction of a fault-tolerant RAM since this
means that the information stored in the memory decays.
For example, consider a computation whose space is
s . Suppose further that for time proportional to s2, the
computation was carried out in only the rst half of the
computation space. Then, within this time, the second half
of the computation space may be completely ruined by faults
that occurred during this time period.
II. The Desired Result
In this section we will spell out a desired result. Before we
need the following denition.
Denition II.1 (Codes). Let Σ1,Σ2 be two subsets of Z. A
block code is given by a positive integer Q— called the block
size — and a pair of functions
ψ∗ : Σ2 → ΣQ1 , ψ ∗ : ΣQ1 → Σ2
with the property ψ ∗ (ψ∗ (x )) = x . y
Now we can spell out the desired statement.
Let RAM machine M2 start from an input x with a
starting conguration ξ0 and suppose that it halts
within T steps, writing the result in the memory
location with address 0. Let S be the amount of
space that M2 used during its computation.
Then, the following can be constructed.
1) RAM machine M1 with k registers that does
not have a halting state.
2) Constant Q and a block code (ψ∗,ψ ∗) of block
size Q .
3) Constants T ′ depending on T and S , and
constant k ′ = O (k ).
Suppose M1 starts from the initial conguration
ξ ′0 = ξ
′
0 (x ), and it operates under noise that consists
of isolated bursts of size at most β .
Then, at any time t , t > T ′, the result of M2 can be
decoded from the memory block k ′, . . . ,k ′ +Q − 1.
III. Solutions and difficulties
A. Solving the problem by simulating the fault-tolerant Turing
machine
It is natural to ask if we can achieve the needed fault
tolerance by just simulating the fault-tolerant Turing ma-
chine from [3] or [4]. The answer is alas negative. The fault-
tolerant Turing machines constructed in [3] and [4] have
an underlying assumption that the information on the tape
does not decay. However, here faults can ruin portions of
the memory far from the location of the head of the Turing
machine.
B. Redundancy and memory updates
Let us consider a dierent solution. We will encode the
control unit of M2 using some error-correcting code into
a xed constant size portion of the memory of M1. Using
the same code we will encode the memory of M2 onto the
memory of M1.
Then, similar to the simulations in [3], [4], we will simulate
one step of M2 by many steps of M1.
Since information “decays” in the memory, we need to nd
a way to constantly update and check all the parts of the
computation space in the memory. Since RAM is a sequential
machine, we need to space out the bursts enough such that
the machine can “catch up” with the decay of the information
in the memory. However, the minimal distance between two
consecutive bursts will now depend on the amount of space
that M2 needs during the computation.
23
C. A hierarchical organization with digests
In the previous section we have seen that in order to
preserve the information from decaying in the memory, we
need to constantly refresh it by decoding and encoding with
an error-correcting code. Doing this for the entire space s of
computation may be time consuming.
Using the approach of [1] we may restrict doing this for a
part of the memory of size O (log s ) which will be considered
more reliable.
Giving a fully edged construction based on this idea and
proving the desired result spelled out in Section II will be a
subject of our forthcoming research.
References
[1] Blum M., Evans W., Gemmel P., Kannan S., and Naor M.: Checking
the Correctness of Memories. In: Algorithmica, 1995, pp. 90-99.
[2] Christiano P, Demaine D. E., and Kishore Sh.: Lossless Fault-Tolerant
Data Structures with Additive Overhead. In Proceedings of the 12th
international conference on Algorithms and data structures (WADS’11),
Frank Dehne, John Iacono, and Jörg-Rüdiger Sack (Eds.). Springer-
Verlag, Berlin, Heidelberg, 243-254.
[3] Çapuni I., Gács, P. : A Turing machine resisting isolated bursts of
faults. In: Chicago Journal of Theoretical Computer Science, vol. 2013.
[4] Ilir Capuni. 2013. A Fault-Tolerant Turing Machine. Ph.D. Disser-
tation. Boston University, Boston, MA, USA. Advisor(s) Peter Gacs.
AAI3536949.
[5] Gács, P.: Reliable computation with cellular automata. Journal of
Computer System Science, 32/1, (1986) 15-78.
[6] Gács, P.: Reliable cellular automata with self-organization. Journal of
Statistical Physics 103/1-2 (2001), 45-267.
[7] Gács, P., Reif, J.A: Simple three-dimensional real-time reliable cellular
array. J. Comput. Syst. Sci. 36/2 (1988) 125-147.
[8] Finocchi, I., Grandoni, F., Italiano, F., G.: Designing Reliable Algo-
rithms in Unreliable Memories. Computer Science Review 1/2 (2007)
77-87.
[9] Pippenger,N.: On networks of noisy gates. In: Proc. of the 26-th IEEE
FOCS Symposium (1985) 30-38.
[10] Spielman, D.: Highly fault-tolerant parallel computation. In: Proc. of
the 37th IEEE FOCS Symposium (1996) 154-163.
[11] Toom, A.: Stable and attractive trajectories in multicomponent sys-
tems. In Multicomponent Systems (R.L. Dobrushin, ed.), Advances in
Probability 6, Dekker, New York, (1980) 549-575.
[12] von Neumann, J.: Probabilistic logics and the synthesis of reliable
organisms from unreliable components. In: Automata Studies (C.
Shannon and McCarthy eds.), Princeton University Press, Princeton,
NJ. (1956)
