INTRODUCTION
Many electrical and computer engineering departments in universities across Canada teach and or perform research on digital system design and development. A set of courses offered in succession provide students with the fundamental theory behind digital systems as well as the knowledge for designing and developing such systems.
The department of Electrical and Computer Engineering at the University of Manitoba offers the following stream of digital system courses: The Microprocessing Systems course provides students with the fundamental theory behind a basic microprocessor [1] . The course begins with a review of Binary and Hexadecimal number systems. This review helps students realize that a digital system understands only the Binary language, i.e. 0V and +5V. Also, a binary number wheel is introduced, which helps students visualize what the digital system is doing when it performs addition and subtraction of unsigned and signed numbers.
The course then goes on to introducing the structure and operation of a tiny operation set calculator (TOC). The TOC is wired up on a breadboard in the lab. Thus, students get hands on experience with the basic concepts of binary encoded instructions, the stored program concept, data paths, macro-and micro-instructions, control vectors, control memory, and interfacing a data processor with a memory. The TOC is used a stepping-stone to explain the difficult-to-visualize concepts of a micro-processor. After describing and working with the TOC, students are presented with the structure and operation of a basic 8-bit micro-processor, roughly modeled after Motorola's 6800 family [2] . After describing the structure and operation of the basic processor, students are presented with assembly language and the instruction set. The fundamentals of assembly language programming is supported with three labs, including multiple-precision unsigned arithmetic, square roots of integers, and bubble sorting, which also provides students with experience with writing a subroutine and passing data via the stack. After dealing with assembly language, including computer decisions and subroutines, the course provides an introduction to physical and procedural interfacing of the basic micro-processor with memories and I/O controllers, such as the Peripheral Interface Adapter (PIA). This is then followed by I/O fundamentals, including polling, interrupts, and Direct Memory Access (DMA) using the DMA controller (DMAC).
Finally, a few design examples are given, including a speech recorder and player, a home security sys-tem, a tv packet router, a motion tracking system, and keypad entry and LED display system. Students find the description of structure, operation, and architecture of the basic micro-processor the most difficult part of the course to comprehend. Although the lectures are very informative, students find the level of abstraction high, and thus, find visualizing the data flow, triggering of data at different levels and edges of clock, and micro-instruction sequencing very difficult. Clearly, some sort of simulation or animation would make the transfer of ideas more communicable.
The remainder of this paper discusses some related work in the area of micro-processing system education, simulators, and animators. Next, a description of experiments and their analysis is followed by a summary and future work.
RELATED WORK
A typical method of teaching micro-processing systems is to provide the theory in class, and reinforce this theory in the laboratory [1] . The laboratory equipment typically consists of development hardware supported by an integrated development environment, such as Freescale's MCUSLK together with CodeWarrior [3] - [4] . The development hardware and software offer register and memory view and modification, breakpoints, and tracing capability. A limitation is that these systems are not capable of tracing through micro-programs, or even to create micro-programs and or modify them.
Another approach is to use a software simulator, [5] - [6] , providing source code assembly, register and memory modification, and macro-instruction execution through tracing and breakpoints. However, engineers need to understand the deeper concepts of micro-operations, micro-code, control vectors, and control memory. Yurcik has provided useful simulation surveys [7] - [9] . However, what is lacking in the current offering of laboratory tools is: a visual aid for the design and comprehension of micro-programmability; electrical and computer engineering students need to learn about how to create micro-operations, to configure micro-operations by establishing control vectors, to add new macroinstructions by defining sequences of new microinstructions. One of the best ways of teaching this subject matter is through an effective graphical aid or tool, such as an animation to view the flow of address and data through the components of the microprocessor, and to provide an animation of the clock showing the synchronization of address and data flows with clock events. This paper reports on an animation tool that enhances design and comprehension of a micro-program based simple 8-bit micro-processor.
SYSTEM REQUIREMENTS
This section states the requirements and provides rational for animation software for a basic micro-processor. (Refer to Fig. 1) . Although micro-processors in today's market run at much higher speeds, have wider data paths, and access much larger memory spaces, the basic 8-bit processor has many remaining advantages. About 75% of the micro-controller applications currently deployed today do not need to run at more than a few million cycles per second. This is because most of the applications deal with Human interaction or slow mecha-tronics devices. Furthermore, these applications do not need more resolution than 8-bit. Moreover, most of these applications do not require a huge amount of memory. Typically, these controllers do not even fill up a 64K memory space. The structural representation of the central processing unit (CPU) should allow a conceptual categorization of the different components. In particular, the control functionality, data paths, and address generation should be arranged in separate units, called computer control unit (CCU), data path unit (DPU), and the address generator unit (AGU).
Since the size of the machine is 8-bit, an 8-bit arithmetic logic unit (ALU) shall be used. The output of the ALU shall connect to a bus, call it the result bus (RBUS), capable of routing data to other components in the AGU, DPU, and the devices in the memory space. To facilitate data provision, two 8-bit data registers shall be used, one for each side of the ALU. However, one of these registers shall act as an accumulator to accumulate intermediate results, via the RBUS, while the other register needs to act as a buffer between the memory and the CPU; therefore, we call this register the memory data register (MDR). This MDR also needs to buffer data between the memory and the CCU, in order to provide the CCU the opcode of an instruction. While one accumulator may be sufficient for a very simple microprocessor, we need another accumulator to help in data manipulation. Therefore, we have two accumulators, ACCumulator A (ACCA) and ACCumulator B (ACCB). Finally, in order to route data along the different paths of the DPU, several data path switches, called multiplexors, are required.
To allow for micro-programmability, the CCU shall provide a control memory to store the vectors for the micro-instructions. The CCU shall be equipped with a basic set of micro-instructions, to implement a small set of macro-instructions. These micro-instructions shall be modifiable, and the creation of new micro-instructions shall be possible. For each macro-instruction, the CCU shall contain a linked list of control vectors, each of which implements a micro-instruction. A linked-list is chosen so that no micro-instruction control vector needs to be duplicated. Since these control vectors will be stored in a control memory within the CCU, the opcode of the instruction shall be used as the key to the control memory. Furthermore, a micro-program counter shall be implemented to point to the next micro-instruction to be executed in the linked list.
The triggering of the machine shall be implemented using two clocks, identical except they shall be 180 degrees out of phase. By having two clocks, more edges and levels are obtained, and therefore, more concurrent operations may be performed. Furthermore, to minimize the cross-talk and inter symbol interference, the clocks shall be non-overlapping, so that when one clock is undergoing changes, the other clock is maintained level. Call these clocks, phase-1 and phase-2, or φ-1 and φ-2.
The AGU shall contain five registers, whose purpose shall be for generating addresses for memory and I/ O controller's register access. Each of these registers shall be 16-bit, to implement a 64K memory space. A stack pointer (SP) shall be required for implementing a stack. The machine shall allow for any method of stack access, i.e., growing towards lower or higher memory locations, pushing and pulling and then decrement/ increment or vice-versa. An index register (IX) shall be implemented to allow for the index addressing mode. There shall be a program counter (PC) register, whose purpose shall be to point to the next macro-instruction to be executed, as well as pointing to the next byte of an instruction during the micro-instruction execution. A temporary holding register (THR) is required for intermediate calculations of the effective address. Finally, a memory address register (MAR) shall be required to act as a port register between the other address generators and the devices in the memory space. All of the address generators shall have the capability of counting up or down (increment or decrement). The counting capability shall be triggered on an edge prior to the latch of data, to make possible concurrent operations. Furthermore, the address generators shall have the capability of sign extension, to allow for signed offsets in an addressing mode, as well as individual high byte of low byte access from/to the RBUS.
To facilitate address generation, a 16-bit adder/ subtractor (16-bit AddSub) shall be contained within the AGU. The inputs to the AddSub shall come from either the SP, IX, or PC, while the other input is fixed and comes from the THR. The output of the AddSub shall be connected to each 16-bit input of the AGU registers. Thus, an effective address may be calculated by simple addition or subtraction, and the result may be loaded into any one of the AGU registers.
To route the different address generators to the port address register, i.e., the MAR, a switch is required: call it the address generator multiplexor (AMUX). The AMUX shall have the capability of routing any AGU register to the input of the MAR or to one of the inputs of the AddSub.
To facilitate a visualization of instruction fetch and execute, a memory shall be present and connected to the micro-processor through the system bus, i.e., the address bus (ABUS), the data bus (DBUS), and control bus (CBUS).
Each component within the animation software's depiction of a basic 8-bit micro-processor contains control signals, which control the functionality of the device. For example, MUX2 needs the capability of either providing high impedance (choose nothing, via the MUX2_CN control signal) or routing data. When routing data, the MUX2 can either take input from the MDR or the RBUS and send the data to either one of the accumulators, ACCA or ACCB. In order to control the switching the data path within the MUX2, there are control signals MUX2_S1 and MUX2_S0, in which case MUX2_CN is false.
To configure a micro-instruction, the state of each control signal for every device within the system must be established. The set of control signals required to implement a micro-instruction is called a control vector. To facilitate control vector configuration, a table shall be provided in the Configure μOPs tab, which shows all of the components of the control vector, with the name of the micro-instruction as the key. A list of control vectors is obtained. This list shall be stored in the CCU's control memory. Thus, a micro-instruction set may be established for the machine.
To configure a macro-instruction, a Build Instruction tab shall be provided. This tab shall provide a table and means to enter information about the instruction, such the opcode, the number of micro-instructions, and the name of each micro-instruction. This table shall be stored in the CCU's control memory.
The machine shall allow for using either the BigEndian or Little-Endian word format. This will facilitate comprehension of these different data word formats.
EXPERIMENTS AND ANALYSIS
A number of experiments were conducted to show requirements verification of the animation software. One of the more interesting experiments is described in this paper.
INDEX ADDRESSING WITH POST/PRE INC/DEC
Consider improving the simple loop shown on the left in Table 1 . The purpose of this experiment is to design a more efficient addressing mode which mitigates loop overhead. and thus, increases the speed at which a loop is executed. This new addressing mode is the same as the INDexed X addressing mode (IND X), except the new addressing mode (IND X+) increments X concurrently while an instruction using the IND X+ addressing mode performs its main function.
To test the IND X+ addressing mode, a loop was created to clear a block of memory. In the original method (left side of Table 1 ), the CLR operation uses the IND X addressing mode. This loop requires 13 machines cycles to execute. The right side of Table 1 shows the implementation of the IND X+ addressing mode, and the loop only requires 11 cycles to execute. This is because the INX instruction is not need when the IND X+ addressing mode is used in the CLR operation. The loop overhead function of incrementing the IX register is performed concurrently with the instruction CLR $00,X+.
To implement and thus verify the IND X+ addressing mode, the CLR $00,X+ instruction was created and animated (using three steps) in the animation software.
(Step 1) First, the macro-instruction was built and entered into the animation software via the Build Instruction tab (see Fig. 2 ). Step 2) Next, the addressing mode IND X was analyzed and determine that the X register may be concurrently incremented only after the effective address has been calculated, because the present value of X cannot change during the calculation. In the instruction "CLR $00,X+", the last micro-instruction is the only microinstruction in which X can be incremented. This is because the effective address has already been generated and stored in the THR, and the value of the X register may be changed without affecting the desired "CLR" operation of the macro-instruction. Therefore, we created four micro-operations: "fetch", "loadOffset", "formEA", and "clear (EA)", as shown in Fig. 2 . To complete the configuration of the instruction, the "Configure mOPS" tab was chosen, and four control vectors were created and entered into the table (Fig. 3) . Note that the control vector for a micro-operation has 80 components, and only 13 components are shown because of space constraints.
(Step 3) Finally, the Animate tab is chosen and the instruction is animated. The binary encoding of the instruction "CLR $00,X+", which consists of the opcode (0x8F) and the offset of (0x00) was written into memory at location 0x0000:0x0001. The IX register was initialized to 0x0003. The instruction was to clear memory location 0x0003. A non-zero value was written to location 0x0003. Figure 4 shows the animation of the fourth micro-operation, in which the X register was incremented concurrently (on the falling edge of φ-1) while the memory location 0x0003 to be cleared (on the falling edge of φ-2). Note that in this micro-instruction, the ALU was configured to output a 0x00, which was used as the data to write to location 0x0003. Note the other interesting aspects of the animation. In particular, the "water flowing through pipes" animation method provides an element of visual learning that is simple and effective. The animation visually shows how the micro-processor controls memory devices by issuing an address, which is followed by data flow. In addition, by juxtaposing the animation of the clock with the "water flowing through pipes" animation, graphically shows that a time-ordered sequence of events are required to execute a macro-instruction. The animation software makes it possible to visually correlate address and data flow with certain clock edges and levels.
5 SUMMARY AND FUTURE WORK This paper describes an animation tool for enhancing design and comprehension of a micro-program based micro-processor. The animation tool provides the ability to create and modify macroinstructions, micro-instructions, and control vectors. The "water flowing through pipes" animation method effectively demonstrates concepts otherwise difficult to communicate using standard teaching modalities. Future work includes addition of new features, such as the animation of instruction and data caches, pipelines, and branch prediction.
ACKNOWLEDGMENTS
This paper has its foundations built from Dr. Kinsner's lectures on computer architecture in the Department of Electrical and Computer Engineering, University of Manitoba. Dr. Poskar assisted in the development of the animation software.
