This paper is concerned with the demonstration of an analogy between classical electrical circuits and computer systems with executing programs. The analogy is based on a Markov process framework for the mathematical formulation. The transition weights represent a set of measure one although they are not necessarily probabilities in the stochastic sense. Examples are included to show the analogies at a level that should be easily understood by an electrical engineer.
INTRODUCTION
The intent of establishing an analogy between electrical circuits and programs executing on a computer system is to provide a simple mechanism for understanding and designing digital systems that must satisfy speci"cations to be expressed in a dynamic form, e.g. instructions per second (IPS or MIPS), #oating point operations per second (FLOPS or MFLOPS), including transient conditions, e.g. reach processing power of X MFLOPS in T seconds.
It is important to note that the terms MFLOPS and MIPS are used as a part of the presentation. These are not maximum speci"cations as traditionally used, but rather they are the dimensions of variables with dynamic properties.
The usefulness of the electrical/mechanical analogy is in its ability to model virtually any con"guration of electrical or mechanical elements in order to convert from one domain to the other. The primary mechanisms are the force/current or force/voltage analogies of the equations which preserve physical quantities through common differential equations.
The analogy demonstrated in this paper links the electrical/computer elements through the "nite state descriptions common to the sampled data electrical systems and the state transitional descriptions of the computer (digital) con"guration. This is the explicit point at which the two types of systems are analogous. In both cases, the discrete difference equations can be in turn related to differential equations which are more of an implicit description than a convenient form for an element by element analogy.
The difference here is that the digital system not only contains physical elements, but it also exhibits a particular behaviour that can vary from one physical architecture to another. This point is important in understanding the signi"cant impact of the analogy discussed in this paper. When two mechanical (electrical) systems are identical, they exhibit identical dynamic behaviour.
Two identical computer (digital) con"gurations exhibit identical behaviour if and only if the software executing exhibits identical behaviour. The block diagram of the computer platform to be used as an example is shown in Figure 1 . P is the processor, M0 is the cache nearest to the processor, M1 is the next level of cache, etc. The main memory is M. The platform example in this paper will include two levels of cache, M0 = CC (onchip cache) and M1 = BC (board cache). This represents an actual motherboard on which many experiments have been run.
Previous work [1] has examined and compared the transient response of the Markov analysis for a processor/cache/board cache/memory con"guration as shown in Figure 2 . Figure 2 assumes that the computer has been turned on, the "rst instruction has been fetched and is on the instruction register. There is a platform-dependent sequence that is a transient to get to this state, but it is program independent. The Markov analysis with the mathematical relationship is given in Equation (1), where the vector, π(n), is the probability of being in any one of the four states giving
The transient response for the execution state, π E (t), is given in Figure 3 , for two different conditions. Consider "rst the dark circles (lower curve) of Figure 3 .
In Figure 3 , the total response (transient plus steady-state), is given mathematically in Equation (2) , where the response is in a sampled data form [2, 3] to suit the discrete transitions of the process of Figure 2 .
The result of (2) in a continuous form is given in Equation (3) .
The response of Equation (3) is of the same form as the response, v o (t), of the circuit shown in Figure 4 . Note that the initial voltage across the capacitor, v C (0), is assumed to be zero (0).
The input v i (t), in Figure 4 , is a unit step input, u(t). Consider "rst the steady-state response of the circuit of Figure 4 . The steady-state output is given in Equation (4) .
A simple equating of parameters yields the result shown in Equation (5) .
FIGURE 4. Electrical circuit with v o (t).
Inspection of the left-hand side of Equation (5) suggests a circuit of the form shown in Figure 5 , to give more detail to the analogy.
The additional detail for the steady-state is shown in Equation (6) [
From Equation (6) 
consider the transient responses given in the two curves shown in Figure 3 , the solid circles (lower curve) and the solid squares (upper curve). To go from the circles to the squares, the only change in the executing program was a reduction in the width of the executing instructions by a factor of two (2) . Therefore, more instructions were fetched on each fetch cycle. This in turn increases the probability of being in the execute state, E. This increases, π E (t), (in the limit as t → ∞) from a value of 0.15 to about 0.25. In electrical terms for a linear system, this can be accomplished by increasing the current through R E or the value of R E . For a linear system, increasing the current would require increasing v i (t).
However, for a set of measure one (1), v i (t) must have a maximum value of one (1). Thus in the current analogy, resistance is inversely proportional to instruction width. In other words, instruction width is proportional to conductance. In Figure 5 , the output voltage, v o (t), is proportional to the MIPS developed by the computing con"guration of Figure 2 . For example, if CPI the average clocks per instruction of the executing code, and Hz is the frequency of the clock, then IPS is given by Equation (7) or MIPS in Equation (8).
The total response (transient plus steady state) for the remaining three states is given below in Equations (9-11). The state space form [4] for the system is given in Equation (12).
The current methodology can also be used to analyse details of a speci"c architecture or processor. Consider the following parameters of a given processor:
1. Processor memory bus width in bits (W ). This is a readily available speci"cation as well as a parameter in design.
2. Size of cache in bytes (S), as with Equation (1). This is a readily available speci"cation as well as a parameter in design. 3. I-average instruction width in bits (I ). This is a measure dependent on the code to be executed and the mechanism as to how the cache is updated. Both of these are design parameters that are provided for in the analogy and the design considerations. 4. Miss Ratio (m). This is a program/system-dependent measure based on speci"c code and parameters such as Equation (2). 5. Number of instructions in the current instruction stream (n). This is totally dependent on the code to be executed and the operating system.
With the Markov analysis [5] , it is possible to calculate directly the following relationship for the probability of the processor being in the execute state, π E . The result is a simple optimization problem where the objective function to be maximized is given in Equation (13).
CURRENT/FUTURE AGGRESSIVE PROCESSORS
The research reported in this paper is focused on a number of processors including three traditional 8-bit microprocessors (8085, 8051 and 6800) due to their similarity in target markets yet with instruction set architectures that can be compared on a common basis.
In addition, to demonstrate evolution from a single processor architecture, the target processors used have included the Intel 80386, 80486, 80486DX2, 80486DX4 and Pentium. To show the applicability from CISC to RISC, generic RISC processors (in the spirit of a simple load/store architecture [6] ) have been used for RISC comparisons. The results in the section on multiple pipes, processors and threads illustrate the similarities/differences among three instruction set architectures operating on the identical problem on three separate processors. All other system architectural aspects are held constant. The results were analysed for both RISC and CISC processors for each instruction set architecture.
RECIPROCITY
The basis of the analogy derived initially from the Markov process analysis is the common form of the differential equations that model both the electrical circuits and the computer architectures (systems). From this common thread there is reciprocity between the electrical and computer analogy. The concept here of reciprocity con"rms the execution of the computer analogy of the electrical circuit. The only question that cannot be answered is the question of realizability [4] . An arbitrary electrical circuit may suggest a computer analogy that does not make sense in that the resulting program for a given platform makes any sense. 
DYNAMICS
The analogy above is based on a discrete time system with a "xed time interval for each state-to-state transition. This is one point at which the analogy with the Markov process has a deviation in interpretation, not in validity. The "xed time interval of the executing program in the Markov matrix model is independent of the speed of the processor, cache, main memory, etc. It is a function of the non-dynamic elements of the computer architecture, e.g. the program structure, cache size, memory size, cache line size, etc. The time basis of this initial analysis is simply in a warp space that can be transformed to real time. This conversion is simply based on the probability of being in any state weighted by the clock cycles in that state. The total of all of these represents the total time expanded from the warp time. Dividing each weighted product by the total time gives the time for each state.
In the example of Figure 2 , the board cache and main memory can be varied in speed. The initial hardware with results plotted in Figure 3 was based on a board cache speed of 15 nS and a main memory speed of 100 nS.
The transitions of Figure 3 can be assumed to require one (1) clock cycle, the cache one (1) clock cycle, the board cache two (2) clock cycles, and the main memory nine (9) clock cycles. Based on these values, the time axis of Figure  3 can be adjusted to show the amount of real time (wall clock time) of the transients. This time is indicated by the D point of Figure 6 . This is the worst case. In Figure  6 , only the transient portion of the response is shown with all steady-state components subtracted out of the response. Thus, it is clear that memory speed has an analogy to a resistance/capacitance combination. Table 1 gives the data on memory for the four cases. From Table 1 and Figure 6 , it is clear that the memory speed is a factor in the exponent giving the dynamics for the R C circuit analogy. The speed of the memory enters into the transient solution. The steady-state of Figure 5 is a function of the resistance which is to be separated from the capacitance. The details such as memory speed can thus be accounted for by the capacitors of Figure 5 .
BLOCK (LINE) SIZE
In some situations, the entire block (line) of executing code may be contained in the nearest cache, M0. There is a condition under which several of the states are not visited. For example, assume the program is very simple in which case the "rst line or block in M0 contains the entire program. In this case, the Miss Ratio = 0, as t → ∞. The fundamental model of Figure 2 holds with the probabilities from CC to BC, and from BC to M, being either zero or an impulse function. The system stays in the E or CC states with the resulting curve as shown in Figure 7 , as CASE E. This curve is particularly important in evaluating the steady-state effect of the program under evaluation. In such cases, the diagram of Figure 2 is reduced to the diagram of Figure 8 .
Case E is the best that can be obtained with no misses, but not necessarily the best case. The absolute best case occurs when the program consists of the shortest possible instructions, CASE F. In CASE F, the shortest instruction is assumed to be 1 byte wide with a 32 bit (4 byte) fetch on the bus. This, for example, is approached with NOP instructions in a LOOP on an Intel 80486DX.
In CASE E, π E (∞) = 0.353. In CASE F, π E (∞) = 0.800. Thus the steady-state improvement achieved by the program/block size is: 
MULTIPLE PIPES, PROCESSORS AND THREADS
This section demonstrates the applicability of the modelling technique to more sophisticated architectures-superscalar, and sophisticated combinations-multiple processors. Figure 9 shows a 2 processor/2 memory (2P/2M) example that is suf"ciently general to demonstrate the applicability of the method beyond single processor single thread cases. P 1 and P 2 may be two pipes in which there may or may not be memory con#icts or hazards, two processors that may be independent and/or executing two independent threads. The processors of Figure 9 may be any type with the same or different instruction sets or different functions. The Markov diagram (Figure 10 ) is an illustration of two independent processors, each executing a benchmark program of moving data from one location to another giving
FIGURE 9. Two (2) processor/pipeline architecture.
high memory access for both code and data. The 2P/2M system of Figure 9 is also modelled with an instruction cache included in the processors, P 1 and P 2 . The notation 2P/2C/2M is used for this type of a system.
In initially choosing the processors, three 8-bit offthe-shelf microprocessors were chosen for one set of comparisons. The manufacturer's data sheets were used to obtain the actual number of cycles of the clock that are required for memory access and execution which leaves the memory bus free. The second set assumes the instructions are implemented on a RISC-type architecture. The processors were then analysed without and with an instruction cache. Table 2 shows the effect of cache memory for both the CISC and RISC processors in terms of instructions executed per clock cycle. The results are shown in Table 3 , in terms of CPI, clocks per instruction.
The platforms of the two previous examples were two pipes or alternatively two independent processors. The primary emphasis of the analysis in this case was the effect of cache. The results are essentially the same for two threads on a single processor where the results are shown in Table 4 . 
SUMMARY AND CONCLUSIONS
An analogy between classical electrical circuits and computer systems with executing programs has been established. The analogy provides the basis for analysis of computers with executing programs in terms of both a transient and a steady-state response. Earlier work on multitasking displays similar dynamic responses [7] , although the analogy in multi-tasking assumes a different conceptual form. The analogy presented in this paper is very powerful and more fundamental in providing a conceptual framework for understanding and designing optimum performance systems. Examples of circuits and corresponding computing platforms are included to demonstrate the application of the concepts involved.
