ABSTRACT. A systematic approach to model microprocessors and their
Introduction
This paper analyses the nature of initialization, data and time abstraction in pipelined microprocessors. With this basic of pipelines and algebra preliminary, the task of this paper is to present an effective algebraic model of correctness for nonsuperscalar pipelined microprocessors design and formal verification, but not restricted to specific reasoning software (such as some term rewriting systems, theorem prover etc.). We do not concern the specific works on microprocessors formal verification using specific software tools. This algebraic model is a general method, and may be represented in a range of machine reasoning systems. It forms a basis of uniform theoretical frameworks for modeling microprocessors, and simplifies the actual processes of formal verification.
We apply this algebraic method to the specification, implementation and verification of a system. The microprocessors can be seen a system determined by iterated maps that data abstractions evolve over time from some initial states, at different levels of temporal and data abstraction. Published by Francis Academic Press, UK -54-Our interest is algebraic models of time and data abstraction, and complex temporal relations when a system evolving from states to states at different levels of abstraction. This involves temporal logic and state machine. We emphasize on the application of this method to an abstract non-superscalar pipeline with dynamic stalling.
In this paper, we introduce the method of correctness for modeling and verification of microprocessors with algebraic tools, and revise some points with vague meaning. Our work is based on the theory presented in [1] . In [3] , Harman started to use algebraic method to model digital system, and emphasize on the specifications for digital systems. Previous to this, they have done many works on temporal logic and formal specification in [4] [5] [6] . In [6] , they present the theory of time-consistency which is applied in the correctness proof. In [7] , they introduce a model of temporal logic and abstraction for synchronous digital hardware. [8] [9] [10] [11] systematically present the algebraic method for modeling microprocessors, and the correctness equation for proof. This method introduced the temporal abstraction, the relationship between time at different levels of abstraction, and the concept of the correctness of the implementation with respect to its specification. In addition, they applied their theory to a case study of a microprocessor. [12] [13] present an application of this method for formal verification of superscalar computers. To prove the correctness of this method, [14] used HOL to prove an application. [15] introduces an overview of progress on the formal specification and verification of a commercial processor -ARM6 with the application of algebraic theory of this method, using HOL proof system. [16] applies Maude to a simple pipelined microprocessor. [17] introduces an algebraic framework for the verification of correctness of hardware with input and output, using HOL. In [18] and [19] , Harman extend its model of correctness for non-superscalar microprocessors to SMT and CMT processors and multithreaded and multi-core processors respectively.
The above is about the algebraic method in formal verification for digital system. There are many other interesting works on pipeline microprocessors. The concept of [20] [21] is from [22] whose verification is on a simple pipelined processor. In [22] , specification and implementation evolve as states stream, but time is not explicitly present; multiple copies of states of specification should be inserted to synchronize the specification and implementation.
[23] presents a new HOL4 formalization of the current ARM instruction set architecture, ARMv7, which is a modern RISC architecture with many advanced features. [24] presents a direction in ISA for producing detailed models of Instruction Set Architectures.
Algebra Preliminary
In this section, we present the basic algebraic theory for modeling time and computer systems. We omit the details of universal algebra for computer science, which can be referred to [25] First, we present the correct models of implementation with respect to a functionality specification.
[1] defines the concept of the correctness of mapping between microprocessors at two level of abstraction. The programmer's model PM can be regarded as a microprocessor's functional or requirements specification, or architecture. The abstract circuit AC is the implementation of a design and describes the main factors of an actual circuit. Definition 2.1 (see [7] [8]) introduces the correctness model for the implementation AC with respect to a functional or requirements specification PM. This definition of correctness with respective to data and temporal abstractions specifies exactly how an implementation is correct to a specific design. In a practical formal verification of pipelined microprocessors, F:T×A→A is the state function of functionality specifications and G: S×B→B is the state function of abstract circuit, with respect to A and B representing corresponding state set. The state-dependent retiming λ is a state-dependent time abstraction from a state with respect to time clock S to time clock T. In addition, we consider that S is faster than T, or S is as fast as T, because S represents the time clock cycles of abstract circuits and T represents the instruction clock cycles.
Temporal Abstraction: Retimings and Immersions
[29] defines a method of time abstraction and iterated mapping. It models time using clocks which divide time into discrete clock cycles, see definition 2.2. The purpose of clock is to denote the discrete time intervals or clock cycles. A clock may not represent a constant subdivision of time, but should denote an interval between significant states. For example, we might use an instruction cycle to represent the execution of instructions in a microprocessor. In reality, the length of each clock here are often different amounts of real time, because of variations of instruction execution times in many processor implementations.
In order to relate multiple clocks, method of retiming and immersion is introduced. We first introduce the retiming mapping, which has two properties: (i) cycle 0 of one clock is always mapped to cycle 0 of the other; (ii) the mapping is surjective and monotonic. The purpose of monotonic is to ensure there is never a discrepancy in the temporal ordering of states after abstraction, because for all s, 
Definition 2.5 Immersion ̅ of a retiming λ∈Ret (S,T), represented by Imm (T, S), is defined by
The set of all immersion from clock T to clock S is represented by Imm (T, S).The meaning of ̅ is to search the first s∈S such that λ(s) = t. We can give another λ definition as follows:
We also recognize ̅ as an inverse function to λ.
According to definition 2.5, the notion of start is present and defined as follows: Definition 2.6 Given a retiming λ∈Ret (S, T) and a time s∈S, the function start, parameterized by λ and s, returns the first time 
Data Abstraction and Iterated Maps
Microprocessors can be modeled as evolving systems of states from a set A, generated by the recursive application of a next-state function f: A→A, starting from some initial state a∈A. A state function F: T×A→A, for some clock T, computes the state of a microprocessor at time t∈T, giving staring state a∈A. The implication of A depends on the level of data abstraction of microprocessors. Typically A will be a Cartesian product of components abstraction representing registers and memories. The clock T depends on the level of time abstraction. For example, if each cycle of a clock T corresponds with an instruction, the T is suitable for architecture, or a programmer's model PM; if each cycle of T corresponds with a system clock, T is suitable for an implementation, or an abstract circuit model AC. Definition 2.8 Given clock T, non-empty set A, and primitive recursive function f: A→A, an iterated map F: T×A→A is a primitive recursive function defined as follows, for all t ∈T and a∈A:
F (a, t+1) = f (F (a, t)).
The above definition acquiesces the starting state is a constant state, we also consider the iterated maps generalized by an initialization function h: A→A. ISSN 2616-5775 Vol. F (a, t) ).
Section 4.1.1 of [1] denotes that, the purpose of initialization functions is to eliminate unwanted starting states, not to describe the initial behavior of a system.
Data Abstraction and Iterated Maps
System states can be 'abstracted' or 'specified' by an abstraction mapping ψ. For example, if a state b represents a state of a microprocessor's micro-architecture, the state ψ (a) can represent a state of the processor's architecture. Through the state transition and abstraction, a notion of temporal abstraction is induced. For example, if the mapping
Is applied to the state sequence We consider that, time is determined by considering the transition of distinct states; if no states transition occurs, or sates cease to change, time is redundant indeed.
By this example, we know that temporal abstraction may occur when there are some data abstraction and state transition. We consider time to be determined by events which can be the occurrence of something significant at the level of abstraction under consideration. For example, we may only consider the start/end of machine instructions to be events at the level of a microprocessor abstraction, and register or memory transfer operations to be events at a lower level. 
State Iterated Maps and Time Abstraction
For all a∈A and t 1 , t 2 ∈T.
There is no initialization function in Definition 3.1. Now we consider the situation of existing initialization function, see also Section 2.3.1 of [13] . F(a, t) ).
State-Dependent and Uniform Retimings
For each state of an implementation there will be an associated state-dependent retiming according to the theory of Section 2.3. 
The set of all immersions relative to retimings in the set Ret (B, S, T) is denote d by Imm (A, S, T).
Refer to definition 2.6 and 2.7, we will get the state-dependent function start and length function len as follows: λ(b, s) ).
Now, based on the above definition, the length function, which represents the length of a retimed clock T with respect to clock S, is defined as follows: 
len (a, λ, t) = ̅ (a, t+1) -̅ (a, t).
We can get another one definition and two lemmas as follows.
According to Definition 3.1 and Corollary 3.1, Definition 3.6 is given. 
Lemma 3.1 F: T×A→A be a iterated map with next-state function f: A→A and initialization function h: A→A. The map F is time-consistent with respect to λ ∈Ret (B, S, T) if and only if , for all a∈A and t ∈T F (a, ̅ (a, t) ) = h (F (a, ̅ (a, t) ))
Lemma 3.2 All iterated maps that do not have initialization functions are timeconsistent. Now, we introduce the concept of uniformity. Uniform shows the relation between the length len (a, λ, t) at some clock t ∈ T and the initial state a ∈ A. According to this concept, given a uniform retiming λ∈Ret (B, S, T), in which the length len (a, λ, t) should be a function of the state a∈A and its retiming λ, independent of time t∈T. We call this property of a retiming is uniformity (see Section 4.4 of [1] ).
We define uniform retiming in terms of its immersion using duration function dur: A→ + . 
Definition 3.7 Let T and S be clocks with clock S faster than clock T, G: S×B→B

= dur (ψ (G (b, ̅ (b, t))) + ̅ (b, t).
According to the definition, the nature of dur is same as len with respect to a retiming λ and its associated immersion ̅ .
Suppose that G represents the implementation of some systems over a clock S, F represents the specification of these systems over clock T, where S is faster than T.
Then specification clock t lasts dur (x) cycles of clock, where x = F (b, t) = ψ (G (b, ̅ (b, t)) is the state of F at clock cycle t ∈ T.
Note that, dur is a function only of states, because data abstraction ψ and immersion ̅ is dependent of state, and consequently the number of cycles corresponding with any states is independent of numerical value of t ∈ T.
In practice, the meaning of uniform is to denote the number of states b ∈ B with respect to an state of a ∈ A, because the clock s denotes the state transition of B.
With the statically definition of dur, we can make concrete statements about how many cycles of the implementation clock S correspond with one cycle of the specification clock T for a possible initial state b ∈ B. And because clock S corresponds the state transition on micro-programmed level, clock T corresponds the state transition on programmer level, we can use the meaning of uniform and dur to get the number of states of micro-programmed level with respect to its programmer level.
One-
Step Theory for Simplifying Verification [1] particularly specifies the concept of one-step theory for non-superscalar microprocessors, we briefly describe it here.
The role of time-consistent iterated maps and uniform retimings is to construct a theorem of one-step for simplifying formal verification. The method of simplifying formal verification is to eliminate induction over time. The fundamental notion is that, in real hardware, future state evolution is not dependent on time, but only on the current state. That is to say that state transition does depend only on the current state (and inputs at the current time if any). Briefly, given two time-consistent iterated maps F: T×A→A and G: S×B→B, related by surjective data abstraction map ψ: B→A and uniform retiming λ ∈ Ret (B, S, T), we can simplify the verification of G with respect to F by just considering correctness at specification times t = 0 and t = 1: that is times s=0 and s = start (λ, 1).
Definition 3.8 Let F: T×A→A and G: S×B→B be iterated maps, λ∈Ret (B, S, T)
be a uniform retiming with respect to G, ψ: B→A be a surjective data abstraction map. If , ̅ (b, 1)) ).
Now, when we formally verify an abstract circuit AC with respect to a design PM in programmer level, we need only to verify AC at times s= ̅ (b, 0)=0 and s= ̅ (b, 1). [1, 3, 15] apply the above concepts to an abstract pipeline case study, and the author has verified the abstract pipeline using HOL or Maude. Now we exploit the concept of [1] to a more universal of abstract non-superscalar pipeline.
An Abstract Pipeline Example
We introduce an abstract pipeline with four stages to sufficiently demonstrate the functionality of pipelined designs. The abstract pipeline is illustrated as follows.
Figure. 3 An Abstract Pipeline
The function of the abstract pipeline is to transfer the data of memory source src to memory destination dst. The memory source register msr and the memory destination dst address the memories.
Functionality Specification
From the perspective of programmers, the abstract pipeline system has two memories and two memory-address registers. The system transfers the data of src at address msr to dst at mdr. The memory state-space is M= [MAR→W] where W is any non-empty set, and the memory-address register state-space is MAR. Where src∈M, msr∈MAR, mdr∈MAR and dst∈M. The next-state function fs updates the destination memory dst at location mdr with f (src (mdr)),
Fs (src, msr, mdr, dst)=(src, msr+1, mdr+1, dst [f (src (msr)) / mdr]).
The expression dst [f (src(msr))/mdr] is derived from the next memory substitution function:
Where the data at a_adress∈MAR is denoted as memory (a_adress); and if the data∈M is stored at address a_adress, the resultant memory is denoted memory 
Implementation Specification without Dynamic Stalling
We can divide the recursive function fs to four computations 1 , 2 , 3 and 4 :
Where, 1 :W→ 1 , 2 : 1 → 2 , 3 : 2 → 3 and 4 : 3 →W, which functionality is to complete the functionality of fs using four steps. 1 In previous articles [1, 12, 18] have modeled several kinds of microprocessors, such as pipelined microprocessors, superscalar-pipeline, SMT/CMT processors. Now, what we interest is to extend the basic notion of above methods to build a more universal algebraic model for formal verification of non-superscalar pipelines. First, we will model an abstract pipelined implementation P 1 without dynamic stalling. We use a counter ctr ∈{1, 2, 3, 4}. If ctr=1, it means only 4 is idle and w 1 , w 2 , w 3 store valid data; ctr=2 denotes 3 and 4 are idle and only w 3 stores junk data; ctr=3 denotes 2 , 3 and 4 are idle and w 2 , w 3 sotre junk data; ctr=4 denotes 1 2 , 3 and f 4 are all idle and w 1 , w 2 , w 3 all store junk data [3] .
The state-space of State P1 is 
, It can also be denotes as follows
Expression ⑨ denotes that if ctr(s)>1, the pipeline will fetch instructions to fill the pipeline. The notion of 1 (σ) is to forward the data computed in the pipeline, and when ctr>1 at a clock time, the ctr will decrement 1 in the next stage.
There is no dynamic stalling, so P 1 will fetch one instruction in one cycle of abstract circuit clock while the msr will plus 1 to fetch the next instruction in the next clock cycle. If ctr=m<4, the state of (1≤i≤4-m) corresponds with source data from memory address (msr-i), after the appropriate operations of , for all j≤i; and the component(s) after stores junk data. For example, if the pipeline is empty or partly empty, 3 stores junk data; if the pipeline is full, The concept and proof of this definition refers to [1] . The next section, we will extend this model to a non-superscalar pipeline with dynamic stalling.
A Model of Non-Superscalar Pipelines with Dynamic Stalling
Now, we introduce our new model of pipelined microprocessors with dynamic stalling. The method of modelling is same as [1] , but we increase some improvements in this article. We do not discuss the details of stalling in pipelines here, readers can research it in many other documents. The notion of a pipeline with stalling is illustrated as Figure 4 .
Figure. 4 A Pipeline with Stalling
In this paper, our principle concern is mathematical models, and not the practical verification. However, our method and model would be advantageous to reduce practical work of formal verification. The one-step theorem is practical-benefic for simplifying workloads in practical verification.
According to the concept of pipelines with dynamic stalling, the stalling component and its previous will detain their states evolution until an event cancels We can conclude that, when existing a source of stalling, the pipeline will stop to fetch instructions until the stalling is canceled. The above equation can be reduced from the notion of pipeline and stalling. We now define the correct implementation equation. From the different of Definition 4.1 and 4.2, we can conclude that, if we ignore the clock cycles where the stages of systems do not change, in this example, the length of state in the functionality specification is not changed.
In our example of which may be stalled in computation 3 , we analysis its property and try to model a correct implementation to a specification to simplify the practical verification. 
Further Considerations
The next work we are concerning is modeling some more complex microprocessors, such as multicore or many-core, superscalar pipelines and other parallel microprocessors; or some complex components, such as cache and memory in many-core, which must be of cache-coherency and memory-consistence. The complex relation between time and data is the most challenge for us to overcome.
