I. Introduction
A quick survey of the Internet reveals that microprogramming is taught in many universities, including our own (Université Laval). In fact, microprogramming is generally included in digital circuit, computer architecture or microprogramming courses in electrical and computer engineering programs and also in computer science programs [1] - [3] . Sometimes (as at Université Laval), it is combined with a study of microcontrollers to form a complete one-term course.
Currently, there are two broad classes of processor architectures: hardwired and microprogrammed. Both have advantages and drawbacks. Microprogramming is the programming, at the lowest level, of the hardware architecture defining a given processor (microprogramming is also known as firmware). Once the architecture is defined, microprogramming enables flexibility and reverse compatibility, since it is "just a matter" of reprogramming the given architecture to update the chip (by changing the firmware). On the other hand, decoding associated with microprogramming degrades the overall processor performance. The well known Intel 8051 (and later 80251) microcontroller is an example of a microprogrammed processor designed to be compatible with the former Intel 8048 microcontroller.
In hardwired architectures, subsystems are based on decoding logic and complex sequential and combinational networks. Advantages of this approach include custom design, smaller (minimal) size, and higher speed. It is generally considered that hardwired architectures are good for fixed (stable) designs and high production volumes. Reduced instruction set computers (RISCs) have hardwired architectures and are less flexible but faster than microprogrammed computers.
The concerns in teaching microprogramming concepts are especially related to the practical aspects. Since, for a given chip, micro-£ The authors are with the Department of Electrical and Computer Engineering, Université Laval, Quebec City, Quebec G1K 7P4. E-mail: maldagx@gel.ulaval.ca programming is "buried in the silicon" by the processor manufacturer, it is not accessible. However, the development of embedded logic such as in large field programmable gate arrays (FPGAs) makes it possible to configure these circuits as microprogrammed processors. This approach was selected for the present project.
The selected microprogrammed architecture is the classical Am2900 described in [4] and [5] . The Am2900 family of products is no longer commercially available. However, it is still "alive" in emulators, FPGAs and code since, as pointed out by D.E. White [4] , "it was just too cool to lose as far as the design community is concerned." Readers are referred to the references for more details on the discussed microprogrammed architecture. The proposed teaching tool was designed and implemented in FPGA, allowing students tō define instruction sets, experiment with a microprogrammable processor, and generate real signal activities at the FPGA outputs [6] - [8] .
In this text, the architecture of the microprogrammable-processor teaching tool and its FPGA implementation are reviewed. One goal of this paper is to present the tool, which is available at no cost (contact the authors) and could be an interesting addition to electrical and computer engineering and computer science curricula when microprogramming concepts are taught. Finally, it is worthwhile to mention that the tool was developed in the popular Xilinx Foundation TM environment commonly found in many universities (more on that in Section III). This is considered a standard approach when compared with the transistor-transistor logic (TTL) discrete-components approach. Surprisingly, such TTL discrete-components approaches are still sometimes used in teaching microprogramming.
II. Processor architecture
In a microprogrammed processor, the main (macro) program consisting of several macro-instructions is translated into hardware activity through micro-instructions. Thus, each macro-instruction corresponds to a sequence of micro-instructions (see, for example, Fig. 1 ). All these microsequences constitute the instruction set to which the microprogrammable processor responds.
The microprogrammable processor used in this project is classical and is based on a pipelined central processing unit (CPU) with scratch pad registers [1] - [4] . Fig. 2(a) presents the general design: the main parts are the computer control unit (CCU), the arithmetic logic unit (ALU) with associated scratch pad registers, the program counter (PC), the memory address register (MAR) and the BUS to move around signals. A display unit is added to show FPGA outputs on LEDs (see Section IV, below). In such a microprogrammed architecture, all the activities are managed by the CCU. The CCU stores the macro-instruction from the macroprogram located in the macromemory of the instruction register and then decodes and translates it into a succession of micro-instruction sequences located in the micromemory (Fig. 1) . A micro-instruction is made up of various fields, each one addressing a particular structure within the processor. In fact, the micro-instruction fields contain all the bits needed to control these structures. Since a pipeline is used here, both the micromemory and the ALU are activated simultaneously, thus reducing the microcycle length. The pipeline is a register that freezes the signals so that the ALU starts activity on the stable data while the micromemory simultaneously fetches the next micro-instruction. The microcycle corresponds to the period of time needed to execute one micro-instruction. Table 1 show the processor architecture in greater detail. The BUS serves for both address and data transfers. It is three-state-protected to avoid conflicts, since multiple units are connected to it as shown. The MAR, which is loaded from the program counter, is the -bit register containing the address pointing to the macromemory. The memory buffer register (MBR) holds the -bit data read or to be written to the macromemory at the address contained in the MAR. The macromemory is bits wide and ¿¾ bytes deep, a size we consider sufficient for teaching. Restrictions on memory size are necessary to fit the design within the 4010XL FPGA (see Section III). In the present system only the five least significant bits of the MAR are used due to the macromemory size (¾ ¿ ¾ ); the other bits are ignored. Readings are made without delay, while writings are executed at the clock falling edge since the address in the MAR is available only after a clock rising edge. The instruction register (IR) is a ½ -bit register holding an -bit operation code (opcode) and an -bit operand. For a typical macro-instruction, the PC points to the opcode "comments:
Figs. 2(b) and 2(c) and

1.
"add the register contents together "2 to 5 are part of the "FETCHINSTR subroutine ( Figure 4 ): 2.
PC on bus and read macro-memory "increment PC
3.
"load read address in opcode part of IR "(1st byte) 4.
"PC on bus and read macro-memory "finish incrementing PC
5.
" while the operand is located at the PC+1 address. As shown in Fig. 3 , the -bit operand is divided into two -bit nibbles corresponding to the two scratch pad register addresses possibly used by the instruction (¾ ½ registers are available). To ease manipulation, the PC is directly implemented as one of the scratch pad registers (PC = Register #0). Subroutine FETCHINSTR ( Fig. 1 and Fig. 4 ) loads the IR and updates the PC so that it points to the next macro-instruction ( PC +2 -PC ).
Block 2910 is the CCU, which is based on the Am2910 supersequencer from AMD [4] - [5] (Fig. 2(b) ). The 2910 recognizes 16 instructions (for example, Continue, Jump to subroutine, Three-way branch [5] ). Essentially, the purpose of the 2910 is to generate an address to the microprogram memory, based on the current macroinstruction. It comprises the following structures: (Fig. 2(c) ). Computations take place on the clock rising edge (when the pipeline settles), and results are saved in a register at the following clock falling edge. Block 2901 is made up of the following structures:
RAM shift (to shift ½ bit to the left or right; useful for multiply and divide-by-two operations), Q shift (similar to the RAM shift; enables dual-precision arithmetic in conjunction with the RAM shift), Q register (an ALU register at the output of the Q-shift unit), scratch pad registers (register selection either comes from the macro-instruction operand (Fig. 3) or is imposed directly by the micro-instruction), ALU data source selector (selects the source of operands either from a scratch pad register or BUS or other ALU units), -function ALU (operations: ·, , Swapped (implementation of and ), AND, Inverted AND, OR, Exclusive OR, Identity; results can be fed back to a scratch pad register, and status bit (Zero, Sign, Overflow, Carry out) can be transferred to the micro-macro flag register), output data selector (transfers either the ALU output or register content to the BUS), and ALU logic (controls the ALU).
Block 2904 is based on the Am2904 from AMD [5] and serves to determine the condition value during a branch instruction. It comprises the following structures: micro-macro flag register (memorizes ALU status bits and makes them available for a decision based on their values; macroflags are available at the macro-instruction level, and microflags are available at the micro-instruction level), conditional MUX (to select the condition of interest, for instance based on the zero flag value), and polarity block (enables the NOT operation by inverting the condition value).
III. FPGA implementation
The microprogrammable processor described in the previous section is implemented in the Xilinx Foundation software environment used in many digital circuit classes, including ours [9] . The three input methods available in Xilinx Foundation are used in this project. In the design, we relied mostly on the schematic editor for easy access to signals (through labels) and direct visualization of the overall architecture by students. Also, a Hardware Description Language (HDL) editor and especially Advanced Boolean Expression Language (ABEL TM ) were employed, since students become familiar with ABEL faster than with VHSIC Description Language (VHDL). 1 Typically, we offer students a three-hour crash course in ABEL, and this has proved to be sufficient for them to feel comfortable with this language. VHDL is more complex than ABEL and requires a full term course to grasp. Moreover, ABEL sources can be encapsulated within a block called "macros" included in the overall schematic design. Finally, Logiblox modules written in VHDL were also written to implement MUX and memories.
In order to run the processor, the first step consists of updating three files to implement thē micromemory, macromemory, and mapping PROM.
Most of the student work deals with writing the micromemory file defining the macro-instructions. In fact, since example files are already provided to students, they need only to update these after becoming 1 VHSIC stands for very high-speed integrated circuits. As mentioned previously, micro-instructions must provide the control bits for all structures present in the processor. Table 2 shows how this is organized with the different fields for all the discussed structures. Students edit ABEL files according to these fields, which implement the ¼ control bits of the processor. Student work essentially concerns the development of new opcodes and thus elaboration of micro-instruction sequences implementing these. Such a study of microprogramming concepts is obviously the purpose of the project.
ABEL template files are made available to students so that no extensive knowledge of this language is needed. For instance, the following ABEL structure is employed to define micro-instructions:
where XX is the address of the micro-instruction within the micromemory (the micromemory has a capacity of ¾ microinstructions) and ... corresponds to the micro-instruction body (there is one micro-instruction per micromemory address). As an example, Fig. 4 shows the ABEL code for the FETCHINSTR subroutine. FETCHINSTR uses micro-addresses 5 to 8. It is thus a subroutine with a length of micro-instructions. A few other microsequences are pro- 
for macro-instructions LOAD R imm and ADD USIERA USIERB respectively. USIERx means the operand comes from the macro-instruction (x = A or B).
vided to students to help them with the system; for instance, ADD R1, R2 (to add two registers together, Fig. 1 ), LOAD R imm (load register with immediate data, Fig. 5) , and so on.
The macroprogram makes use of the available opcodes defined in the micromemory to perform a given task. The minimum macroinstruction is ½ bits long and is coded on two consecutive macromemory addresses (since the macromemory is -bit). As mentioned previously, the first byte defines the opcode, and the second, the operands (Fig. 3) . Moreover, only the last bits of the opcode are used, so that a maximum of ¾ ¿ ¾ opcodes can be defined. Although the minimum macro-instruction length is ¾ bytes, there is no maximum. This minimum figure comes from the fact that the FETCHINSTR subroutine increments the PC de facto by ¾ since "normally" an instruction is made up of both an opcode and operands such as ADD R1, R2. Of course, FETCHINSTR could be redefined differently. For longer macro-instructions (i.e., instructions of more than ¾ bytes), it is the user's responsibility to increment the PC as required. For instance, macro-instruction LOAD R imm is defined in ¿ bytes (Fig. 5) : opcode (½ byte), operand (½ byte, one nibble, in fact, since only one register is involved here) and data (½ byte). Fig. 6 lists a macroprogram example: macroprograms are created as VHDL Logiblox, and templates are available to help students write these.
Finally, the mapping PROM file performs the correspondence between a given defined opcode and the address of the first corresponding micro-instruction within the micromemory. This look-up table (LUT) takes the five least significant bits of the macro-instruction opcode and translates them to an -bit-long address. Here too an ABEL structure is employed to define the mapping PROM LUT. For instance, when (A ==ˆh01) THEN S =ˆh19; makes the correspondence between opcode number 1 and the microsequence starting at micro-address 19 ("A" stands for opcode and "S" for address). A mapping PROM file example is listed in Fig. 7 . Students can follow the progression of the signals directly on the screen (see Fig. 8 ) since the screen matches the architecture layout of Fig. 2 (thanks to the schematic editor available in Xilinx Foundation). This is particularly useful at the debugging stage. Moreover, doubleclicking on blocks enlarges them, allowing students to view their detailed content descriptions (for example, a listing of the ABEL file or VHDL Logiblox). Timing diagrams of simulations can also be saved and printed for inclusion within a lab report.
Once the simulation is satisfactory, it is possible to download the design in the 4010XL FPGA. Here again, the Xilinx Foundation environment is used to provide the ".bit" configuration file to be downloaded to the FPGA (with a utility program such as gxsload, available from Xess Corporation [11]). The 4010XL FPGA is mounted on an XS40-010XL board available from Xess Corporation and connected to a Windows computer through the parallel port [11] . Eight inputs (for example, dip switches) and eight outputs (for example, LEDs) are connected to the 4010XL FPGA (see Display structure in Fig. 2(a) ). Two of the inputs are reserved (so that six are available for selection): one serves for the reset and the other for the clock. The clock can be connected to a debounced switch for manual step-by-step testing or to a square-wave signal for automatic progression (e.g., ½ Hz). Of the eight input bits, the six remaining input bits display different values on the output LEDS, such as the content of any scratch pad registers, the content of a macromemory location, or the output of the 2910 MUX (Section II). This last possibility provides a way to check which microinstruction is being executed. After five clock rising edges at start-up, a reset is completed (thanks to the RESET microroutine located at micromemory address 0), and the processor is ready to start its task. Execution next proceeds at each rising edge of the clock. Of course, with only six inputs and eight outputs, very little "useful" work can be done with this implementation, but such work is not our purpose. In fact, for the purpose of teaching microprogramming concepts, simulation in the Xilinx Foundation environment is sufficient. However, the ability to configure the FPGA and observe real hardware activity can be seen as the "ultimate microprogramming challenge." Knowing that this capability is available in the tool thus makes it attractive and "complete," since the hardware can be exercised by executing "real software" (as defined by the macroprogram), although on a small scale.
V. In-class deployment
Since the system is based on the same FPGA platform (the XS40-010XL board) and software tools (the Xilinx Foundation environment) that are employed by our digital design course, substantial savings are realized. This aproach also spares us the requirement to define and develop our own support environment. Moreover, since students in the microprogramming course have already taken the digital design course, they are familiar with both the FPGA platform and the Xilinx Foundation environment. Thus they are able to go deeper into microprogramming concepts. The system was tested repeatedly within our -student microprogramming class through both in-class demonstrations and homework (students are grouped in teams of two). They were asked to complete several exercises in order to become acquainted with the system and to develop several new macro-instructions. Students found their experience with the system to be quite positive.
It is worthwhile to note that availability of the supporting hardware (the XS40-010XL board) was found to be useful, although, as mentioned previously, it was not essential since the simulation performed in the Xilinx Foundation environment already provides the capability for exhaustive testing of the microprogrammable design.
VI. Conclusion
In this paper, a microprogrammable processor and its FPGA implementation were described. This system was developed as a teaching tool for microprogramming in our electrical and computer engineering curriculum. Since the supporting hardware/software combination is common in many universities, it is believed that the presented microprogramming teaching tool will be well received, especially since it is available at no cost. Finally, it is important to point out that although for this discussion the design was implemented in a Xilinx 4010XL FPGA, it could of course be implemented in other architectures as well, provided the number of available gates is sufficient.
