Abstract. Abstract register machines are an important formal model of computation widely used for the modeling of many classes of computational algorithms and the analysis of their complexity. Despite the signicance of this model, formal verication of general-purpose programs for abstract register machines has not been properly covered in the existing literature. To ll this gap, we provide a formal specication for one version of abstract register machines the random-access machine invented by Aho, Hopcroft and Ullman. Executions of this machine are formalized by a transition system in the language of the verication system PVS. We also specify in PVS a simple search program for this architecture and its correctness property. The property is proved using the interactive proof checker of PVS. We were able to prove not only the functional correctness of the program, but also its time complexity, which shows the novelty of our approach.
Introduction
Abstract register machines (ARMs) [9, 2] , together with Turing machines and recursive functions, belong to the most important formal models of computation. However, unlike Turing machines and recursive functions, they are much less abstract and much closer to the operations of actual computers.
Indeed, although ARMs are not directly based on any computer architecture used in practice, they are still able to imitate closely the structure of modern CPUs and their arithmetical, logical and input/output commands. For this reason, the formalism of ARMs has been used during several decades for the formal, mathematical study of computational algorithms and programs.
Perhaps the best known example of this approach is shown in [7] , where a highly detailed version of an abstract register machine is used to dene rigorously many important classes of computational algorithms and to prove their complexity measures.
Although a lot of research has been dedicated to the verication of programs written in high-level languages such as C and Java (and also to the correctness of some register-based processor designs), we are not aware of any works that systematically study the formal verication of computational programs for ARMs. This is unfortunate, because formal verication can greatly improve our understanding of programs and algorithms, and help to nd and remove subtle errors in them. Such verication is especially rigorous and convincing when it uses some form of automated support, for example an interactive theorem prover. Since ARMs have been successfully used to analyze the complexity measures for many types of computer algorithms, it is an interesting challenge to create a formal framework that allows proving not only the functional correctness of programs for them, but also their time and space complexity.
In the dissertation [3] , a promising general method was presented for the specication and verication of distributed protocols. In [3] , it was used to verify several non-trivial examples from the eld of databases, and in [4] , it was successfully applied to the Sliding Window protocol. Here we show how a modication of our method can be used to provide a formal framework for the verication of programs for ARMs. We consider one particular version of ARMs the random-access machine invented by Aho, Hopcroft and Ullman [1] , which we call here RAM-AHU. In our method, we represent the data structures of RAM-AHU in the language of the verication system PVS [10] and dene the eect of its commands on these data structures. After that, the executions of RAM-AHU are formalized as nite or innite traces of a transition system generated by the eect predicate of its commands; the correctness properties of programs for RAM-AHU are dened by logical formulas on traces of this transition system.
We use our method to verify a simple search program for RAM-AHU.
A transition system is generated, and for that system the correctness property of the program is dened as some relation between the input sequence in the initial state and the output sequence in the nal state. This allows us to verify formally the correctness property using the interactive theorem prover of PVS, which leads to a proof of correctness that is both rigorous and intuitively understandable. We were able to prove not only the functional correctness of the search program, but also its best-case and worstcase execution time. We hope that this fact demonstrates the novelty of our specication method, because the functional correctness of programs and their time complexity are usually analyzed and proved using dierent formal frameworks.
The rest of the paper is organized as follows. In Section 2, we give a brief introduction to the PVS system. Section 3 describes RAM-AHU on the basis of [1] . In Section 4, we present our formalization of the executions of abstract register machines which can also be applied to RAM-AHU. In Section 5, the RAM-AHU data structures are specied in PVS, and in Section 6 its commands are formalized in PVS. Section 7 presents a specication of a simple search program for RAM-AHU and verication of its correctness property. Finally, Section 8 gives some remarks on related works and possible future work. In the prover of PVS, every goal or subgoal is displayed in the following form:
. . .
This display is a sequent: formulas above the dashed line (A1, A1, A3. . . ) are called antecedents and those below (B1, B1, B3. . . ) are called consequents. The sequent is interpreted as follows: conjunction of the antecedents implies disjunction of the consequents. The lists of antecedents and consequents may both be empty (an empty antecedent is equivalent to true, and an empty consequent is equivalent to false).
The proof of every theorem in PVS begins with a single consequent (representing the theorem). The objective of the proof is to create a proof tree of sequents in which all leaves are trivially true. The prover is always attempting to prove some unproved leaf in the tree. It can accomplish this task by invoking one of its commands, which either proves the current sequent (usually by applying some of the decision procedures) or splits it into several easier subgoals. When there are no more unproven branches in the tree, the prover noties the user that the proof is complete. The resulting proof is automatically stored in a le and can be run again later. The PVS system has extensive facilities for managing the proofs and displaying information about them.
In our PVS specication of RAM-AHU, we mostly use natural numbers and integers to represent some variables in the data structures of the machine, and an abstract datatype to model the commands of the machine.
Additional datatypes are constructed from these basic types by applying records, nite and innite sequences and predicate subtypes. Many predicates and lambda functions are also used to generate the whole specication and a search program for RAM-AHU. Verication of the program relies on the PVS decision procedures for Boolean logic and arithmetical operations on natural numbers and integers.
RAM-AHU
The random-access machine invented by Aho, Hopcroft and Ullman [1] , which we call here RAM-AHU, is a computing device with one adder in which the program cannot change itself (the so-called Harvard architecture).
It consists of three parts: the input tape, the main (computational) part, and the output tape.
The input tape is a sequence of cells of an unlimited length. Each cell contains a symbol; it is only possible to read symbols from the input tape but not to write them. At any moment, the reading head of the tape points to some cell. After reading a symbol from that cell, the head moves one cell right.
The output tape is also an unlimited sequence of cells, with each cell containing a symbol. It is only possible to write symbols to the output tape but not to read them. At any moment, the writing head of the tape points to some cell. After writing a symbol to that cell, the head moves one cell right. It is not possible to change symbols that have already been written to the output tape. For this version of the machine, all symbols that can appear on the input or output tape are integers.
The computational part of RAM-AHU consists of a program, a program counter, and memory. The program for RAM-AHU is a nite sequence of commands; each command can have a label. It is assumed that the program is not stored in the memory, so it cannot change itself during its execution (which corresponds to the so-called Harvard architecture). There are commands for arithmetical operations, conditional and unconditional jumps, input/output operations and some others.
At any moment of time during the program execution, the program counter points to some of its commands that should be executed at the next step of the computation. After the command with some index k is performed, the counter automatically moves to the command with the index k + 1 (i.e. the next command). The only exception is made for conditional and unconditional jumps, as well as the command HALT which stops the computation.
If the counter no longer points to any command (i.e. exceeds the length of the program), this means that there are no more commands to be executed, so the computation is over.
The memory of RAM-AHU is a sequence of registers r 0 , r 1 , . . .r i , . . .; each register can store an arbitrary integer. It is assumed that there is no upper limit to the number of registers that can be used. This idealization is reasonable when the size of the task is small enough to t in the main memory of the machine. The rst register r 0 , called the adder, participates in all arithmetical operations (it can also store an arbitrary integer).
The initial state of RAM-AHU is determined by the chosen program and its input data. In any initial state, there are some symbols on its input tape (i.e. the input data), all registers are empty, the output tape is also empty, and the program counter points to the rst command of the program. After the execution of each command, the program counter changes as described above until it eventually exceeds the length of the program and the computation stops. It is also possible that this event never happens (i.e.
there is always some command waiting to be executed), and this leads to a non-terminating computation.
Each command of RAM-AHU consists of two parts its operation code and its address. The command's address is either an operand or a label of some command in the program; in some cases it can also be empty. An operand a can be of one of the three types:
1. The expression =i means the integer i itself and is called a literal; If some command has an operand a, we can dene the value v(a) of this operand. The denition of the function v uses another function c: for each natural number i, c(i) is the content of the register i. Using the informal denition of the expressions =i, i and * i given above, we dene the value of an arbitrary operand a as follows:
There The following list denes the eect of each command. Here the sign ← denotes an assignment, and the function f loor(x) gives the greatest integer that is less than or equal to x. Undened commands and commands with an illegal value of the address are equivalent to the command HALT. 
Our formalization of abstract register machines
Our methods of specication are not specic to RAM-AHU but can be used in principle to model the behavior of any abstract register machine. They have signicant similarities to the methods we previously used to specify and verify distributed protocols in [3] and [4] . In our approach, the behavior of an abstract register machine is dened by the notion of a state, 
Data structures of the machine in PVS
To model RAM-AHU in PVS, we need to dene the structure of its states.
The state should include the program of the machine, the value of its registers and of the program counter, the input and output tapes. Since any program is a sequence of commands, we need to specify the structure of the machine commands.
In the informal denition of a program, only some of its commands have labels, and these labels are represented by words in a natural language. In PVS, it is much more convenient to have a label for every command, and to represent labels by natural numbers. A label equal to 0 is interpreted as the absence of a label, and real labels are modeled by positive natural numbers. The meaning of the elds in the type RAMstates is rather obvious: the program is represented by a nite sequence of commands, the eld pCounter models the program counter, the eld registers represents the innite sequence of registers, where each register can hold an integer. The input tape and the output tape are also modeled as innite sequences of integers. The eld inputHead points to the cell of the input tape that should be read during the next read command, and the eld outputHead points to the cell of the output tape that is due to be written during the next write command.
Suppose that we have a program SomeProg (i.e. a nite sequence of the type Commands) and an input tape SomeInputTape for it (i.e. a sequence of integers of unlimited length). The initial state for SomeProg and SomeInputTape is dened in a rather obvious way: they are included in the state, an empty sequence EmptyIntSeq (i.e. a sequence consisting of only zeros) is assigned to the elds registers and outputTape, and 0 is assigned to the program counter and the variables inputHead and outputHead. So Even for such a simple task, the resulting program is not particularly short.
It consists of 9 commands numbered from com0 to com8 which are given below. 
We also dened the input tape SearchSeq on which the computation of SearchProg should begin. It contains arbitrary integers int0 and int1 followed by a string of zeros. The denition of SearchSeq is given below. Since int0 and int1 are arbitrary constants of the type integer, it is clear from this denition of SearchSeq that it models any possible input tape for the program SearchProg. It is easy to see how the program SearchProg computes the larger of int0 and int1. The command com0 reads the integer int0 and places it into the register with index 0 (the adder). After that, the command com1 reads the integer int1 and places it into the register with index 1. Since another copy of int0 will be needed shortly, the command com2 stores int0 into the register with index 2.
After that, the command com3 subtracts int1 from int0 and places the result into the adder. If it is greater than 0, then int0 > int1, so the command com4 moves the program counter to the command with label 1, i.e. the command com7. The command com7 writes int0 from the register with index 2 to the output tape, and the command com8 terminates the computation. However, if int0 ≤ int1, the program counter moves to the command com5. The command com5 writes int1 from the register with index 1 to the output tape. After that, the command com6 unconditionally moves the program counter to the command com8. Again, the command com8 terminates the computation.
The initial state SearchIni of RAM-AHU for SearchProg and SearchSeq is dened in the same way as was presented in Section 5: they are included in the state, an empty sequence EmptyIntSeq (i.e. a sequence consisting of only zeros) is assigned to the elds registers and outputTape, and 0 is assigned to the program counter and the variables inputHead and outputHead. After that, we can use our denitions from Section 4 and obtain the set of complete runs for SearchIni.
Specication and verication of the correctness property
The correctness property for the program SearchProg is as follows: it terminates for any values of int0 and int1 (i.e. the numbers in the beginning of its input tape in the initial state) and, in the last state of its complete run, there is exactly one number written on its output tape equal to the maximum of int0 and int1. If crun is an arbitrary complete run, the correctness property for it is dened as follows (here the function last gives the last element of a nite sequence):
We proved in PVS the following theorem called Main which establishes Below we present the proof itself.
Proof of the theorem Main. Like all PVS proofs, our proof is structured as a tree. The root of our tree is the theorem Main, and most of its leaves are lemmas InfIniLem, InfELem, FinIniLem, FinELem and FinLastLem, which will be given below. These lemmas, which we call elementary lemmas, follow directly from the denition of complete runs as it was given in Section 4. We only need to replace in that general denition the initial state Ini by its instance SearchIni for the program SearchProg. 
The lemma FinELem means that in any nite complete run each state should be obtained from the previous state according to the eect predicate.
It follows directly from clause 2 in the denition of nite complete runs.
Finally, the elementary lemma FinLastLem expresses that the last state of any nite complete run should be nal (in the sense dened in Section 4).
It follows from clause 3 in the denition of nite complete runs.
∀crun : f in?(crun) ⇒ isF inal(last(crun)) (FinLastLem)
Now we continue with the proof. Let crun be an arbitrary complete run which can be either innite or nite. Below we consider both possible cases.
The case of an innite complete run. If crun is innite, our goal is to prove that this is impossible, i.e. obtain a contradiction. This is done by
showing that the program counter in an innite run will eventually exceed the length of the program. We proved the following lemma BadCounter which expresses that the program counter will reach the value of 9 either in the state with index 7 or in the state with index 8:
∀crun : inf ?(crun) ⇒ (pCounter(crun(7)) = 9 OR pCounter(crun(8)) = 9) (BadCounter) The case of a nite complete run. If crun is nite, our aim is to prove that eventually the nal state will be reached in which the output tape contains either int0 or int1, depending on which of these numbers is larger. If int0 > int1, such a state will be reached after executing exactly 7 commands, and if int0 ≤ int1, it will be reached after exactly 8 commands.
We proved the following lemmas ShortPathLem and LongPathLem which describe both possible cases:
We do not discuss here the proofs of lemmas ShortPathLem and LongPathLem because of their large size and complexity. Using these lemmas, we can easily prove the theorem Main. Indeed, if int0 > int1, we apply the lemma ShortPathLem and obtain: pCounter(crun(7)) = 9, outputHead(crun(7)) = 1 and outputT ape(crun (7) 
Conclusion
Abstract register machines (ARMs), which include counter machines [8] and pointer machines [11] , as well as more realistic models of hardware such as random-access machines [5] and random-access stored-program machines [6] , are an important model aimed at rigorous analysis of computer algorithms.
We presented here a formal framework for the specication and verication of computational programs for ARMs something not presented in [5, 6, 8, 11] and subsequent works on this model. As we already mentioned, our framework allows proving not only the functional correctness of such programs, but also their best-case and worst-case time complexity.
The version of ARMs considered here (based on the book [1] ) has not only signicant similarities to the random-access machine from [5] , but also some dierences. For example, the only primitive arithmetic commands in [5] are addition and subtraction, but there is a mechanism that allows creating arrays. However, our version has little in common with pointer machines from [11] . The computational part of pointer machines is so primitive that they cannot perform arbitrary arithmetic operations, and this makes them unsuitable for programming of complex algorithms.
To illustrate our method of verication, we used it to verify a search program which computes the larger of two arbitrary integers. Despite the apparent simplicity of this example, we believe that it is far from trivial. Indeed, since the initial data for our program belong to an innite domain Z × Z, we veried its correctness for an unlimited number of possible executions. This is something that is rather challenging for fully automated techniques such as model-checking, but can be done using deductive verication. An additional advantage of our approach is the fact that we were able to prove the exact time complexity of the program: its executions consist of at least 7 and at most 8 commands.
Naturally, the search program considered in this paper is only the rst step in our investigation of programs for abstract register machines and their formal verication. In our future work, we would like to verify a program for RAM-AHU that performs a search in an array of an arbitrary size. Another interesting possibility is to investigate how to eciently sort large arrays on this architecture, and also to verify formally such sorting programs. It is well-known that programs that process data structures of arbitrary sizes usually cannot be veried fully automatically. Therefore, it seems completely appropriate to use deductive verication in order to ensure their correctness.
We also plan to extend our framework so that it would allow us to prove not only the time complexity of programs, but also their space complexity.
