Abstract-The rapid increase in complexity and size of digital systems has reduced the effectiveness of old design methodologies based on physical prototyping. Prototyping via simulation must be used to achieve design cost and time-to-market goals when designing large digital systems. This virtual prototyping design methodology often permits the first physical prototype to be a manufacturable product. A two-course sequence has been developed to introduce students to this design paradigm. These courses teach virtual prototyping techniques and allow the students to use these techniques to develop a simple computer. The students simulate their designs, and then they implement their designs in hardware using field programmable hardware. This allows the students to complete an entire design cycle from idea to actual hardware implementation and compare their physical results to their simulated results.
Abstract-The rapid increase in complexity and size of digital systems has reduced the effectiveness of old design methodologies based on physical prototyping. Prototyping via simulation must be used to achieve design cost and time-to-market goals when designing large digital systems. This virtual prototyping design methodology often permits the first physical prototype to be a manufacturable product. A two-course sequence has been developed to introduce students to this design paradigm. These courses teach virtual prototyping techniques and allow the students to use these techniques to develop a simple computer. The students simulate their designs, and then they implement their designs in hardware using field programmable hardware. This allows the students to complete an entire design cycle from idea to actual hardware implementation and compare their physical results to their simulated results.
Index Terms-Computer design, computer-aided design, register transfer level, VHSIC Hardware Description Language (VHDL), virtual prototyping.
I. INTRODUCTION

D
IGITAL systems have increased in complexity and size to the point that design methodologies based on physical prototyping are no longer cost effective or efficient. A design paradigm based on prototyping via simulation, called virtual prototyping, is becoming more prevalent. Virtual prototyping significantly reduces design cost and time-to-market while allowing a greater exploration of the design space [1] .
Various forms of virtual prototyping for computer design education have been used widely for several years [2] - [4] . Hardware description using simple languages, simulation, and hardware prototyping using field-programmable hardware have been used with reported success in different programs. Earlier efforts reported use of simple, often locally designed, computeraided design (CAD) tools. As commercial CAD systems have improved and become more accessible in the academic environment, programs have made greater use of standards such as the VHSIC Hardware Description Language (VHDL). The two-course sequence reported here makes use of VHDL and commercial CAD tools for design capture, simulation, and synthesis. Several CAD vendors now offer their tools with very favorable terms to academic institutions for educational use, and the course sequence reported here took advantage of such an opportunity to use commercial tools exclusively.
The Computer Engineering group of the University of Virginia has developed and implemented a sequence of two senior-level courses that fully embrace the concept of virtual prototyping. The first of these two courses, ECE435 -Computer Organization and Design, introduces the students to top-down design philosophy, CAD, and simulation. The second course, ECE436 -Advanced Digital Design, imposes practical implementation constraints and culminates with the implementation of designs on a physical prototype board. Time constraints of the classes dictate that only one physical prototype may be constructed, and the virtual prototyping techniques are, therefore, essential to permit success.
These two courses endeavor to provide the students with a range of practical prototyping experiences while balancing the needs that are unique to the educational environment. The exclusive use of commercial CAD tools offers design experience that may be transferable to future employment as it also exposes the students to the capabilities and limitations of the tools that are currently on the market. The final design involves both commercial off-the-shelf (COTS) components, and custom designs implemented in field programmable devices. This combination exposes the students to practical interface issues. The major effort to design a complete digital processor is effectively completed twice: once as an individual effort during the first course and again as a team effort during the second course. This arrangement permits the students to experience team dynamics while it improves the capabilities of everyone on the team to contribute. The first design pass uses only design capture and simulation and serves to convey the basics of design and the tools. The second pass involves design refinement and synthesis, and it clarifies the function of design iteration and tradeoff analysis.
This paper offers a description of the two-class sequence with ECE 435 described in Section II and ECE 436 described in Section III. Section IV presents the results of a sample project done in the two-class sequence, and Section V presents conclusions.
II. ECE 435 -COMPUTER ORGANIZATION AND DESIGN
The first class in the sequence is typically taken during the fall of the undergraduate student's final year of study. Students entering this class have already completed a class in basic logic design and a class in computer architecture. This class is as- 
A. ECE435 Lectures
The goal of this course is to teach the students the concepts of computer design at the register transfer level. Successful existing architecture elements are presented along with computer design techniques and tools. Simulation-based design and the use of a Hardware Description Language (HDL) are central to this class.
VHDL [5] is used as the hardware description language in this course sequence. The introduction to VHDL is accomplished by first presenting VHDL language syntax and semantics. Next, techniques for behavioral modeling of register transfer level components are introduced. The next step is to develop basic VHDL structural modeling techniques to interconnect these components. With the ability to model simple behavior and structure, the students are next challenged to describe more complex sequential devices such as state machines and controllers. Finally, techniques for including accurate timing in VHDL models are presented. At this point, the students have enough knowledge of VHDL to write behavioral descriptions of the components needed for their designs and to generate VHDL structural descriptions to combine these components into a working processor model.
Once the students understand the basics of hardware description, the lectures shift to computer design and organization. One of the first topics discussed in this area is that of bus modeling and design. This subject introduces various bus arrangements and the modeling of bus operation in VHDL. Next, there is a discussion of data path topology and timing disciplines with a focus toward management of complexity in data path design. Several different control unit architectures are then outlined including hardwired and microcoded designs. Pipelines are presented as devices to exploit parallelism in a single instruction stream. Memory topologies constitute the next focal point for the class. The final area for lectures considers various successful, historical, and future computer architectures including Reduced Instruction Set Computers (RISC), Complex Instruction Set Computers, Superscalar, and Very Long Instruction Word (VLIW) architectures.
B. ECE 435 Labs
Most of the labs demonstrate how to write, compile, and simulate VHDL code. The major focus of these labs is the design of a data path and control unit that can implement the instruction set architecture (ISA) of a small 8-bit computer called the 35VEE8 [6] . The ISA was designed specifically for this two-course sequence, and it is intended to offer a simple architecture that can be designed and implemented at the register transfer level by a single designer in a single semester. Significant limitations of the 35VEE8 architecture include a small instruction set, minimal addressing modes, an 8-bit data word, and a 64 K-byte address space. The instruction set for the 35VEE8 was given to the students at the beginning of the semester in the form of a programmers reference manual [6] so that they knew the functionality objectives from the start of their design.
A listing of the laboratory assignments is given here. Each student working alone performs these assignments. 1) Design, describe in VHDL, and simulate a 1-bit arithmetic and logic unit (ALU). Cascaded together, these form an ALU of arbitrary bit width. 2) Using the 1-bit ALU from the previous laboratory exercise, compose and simulate an 8-bit ALU. 3) Design, describe in VHDL, and simulate an 8-bit two-to-one multiplexor and an 8-bit transparent latch. 4) Design a data path on paper that is capable of performing all the specifications for the 35VEE8. 5) Implement and simulate the data path design. 6) Design and simulate a control. 7) Connect the data path and the control unit together. 8) Run a sample program and determine the maximum clock frequency, number of clock cycles, and the average execution time per instruction.
C. Methodology
The design methodology is a mixture of top down and bottom up. The students first create a library of register-transfer-level components. Next, the students return to the top-level specification, the programmer's reference manual, and proceed to design the data path and control unit for their processor in a top-down manner.
D. Tools
The Mentor Graphics Corporation CAD tools are used throughout the class. Design Architect is used for basic VHDL development work. Renoir is used for graphical system composition. This tool produces VHDL as its output for simulation. The final tool used in this course is Mentor Graphic's QuickHDL. These tools are used on UNIX workstations, but similar alternatives exist for the PC environment.
E. Functional Testing
As with any design of a complex system, testing is necessary to demonstrate that a design is functionally correct. For the individual laboratory exercises, the students must provide simulation results. Timing analysis is also required for some of the labs.
For the overall design, the students are required to develop their own functional testing methodology. Most students choose to test each functional block of their processor independently through exhaustive simulations. After these individual blocks have been verified, they are then placed together into larger units and simulated.
Most students simulate their data path independently from the control unit to make sure that each performs at least a portion of the instructions correctly. Next the data path is integrated with the control unit, and individual instruction execution is tested. For the final test, the students are given a small standard program.
F. Results
In each year that the course has been offered, approximately 90% of the class has produced a working processor at the completion of the class.
III. ECE 436 -ADVANCED DIGITAL DESIGN
The second class in the sequence is typically taken in the Spring of the undergraduate student's final year of study. Students entering this class must have completed ECE435. This class is also assigned 4.5 semester hours of credit. The extra semester hours are associated with this class because of its significant laboratory component.
A. ECE436 Lectures
The objective of this second course is to implement designs in hardware. To achieve this goal, the students must be provided additional instruction in design. The first new topic is logic decomposition. This topic includes functional decomposition and multiplexor implementation of logic functions. The students use these algorithms to partition their designs for efficient implementation in different field programmable gate array (FPGA) or programmable logic device (PLD) architectures.
The next topic introduces the students to the differences among logic families and the advantages and disadvantages of each. This introduction prepares the students to determine the best components for an application based on characteristics such as size, speed, power, etc.
Digital system implementation alternatives are next discussed. Options included are full custom and standard cell integrated circuit design, gate arrays, FPGAs, and PLDs. The emphasis on this subject is placed on FPGAs. The varieties of architectures for PLDs are presented next.
Testing and design for testability is central to the material in ECE436. The class covers topics such as fault models and the generation and reduction of fault tables. Different types of test generation and fault simulation algorithms are discussed and compared. Several types of scan techniques are presented. The benefits of design for testability for functional testing is discussed, and the students are encouraged to include some design for testability techniques within their designs to assist with debugging.
The course next considers different levels of design including: register transfer, logic, algorithm and behavioral, and system level. This discussion illustrates the concept of top-down design and shows the utility of system-level performance, dependability, and functional modeling. Although these types of modeling are not used in the class project, this introduction is included as an important part of virtual prototyping.
B. ECE 436 Labs
The laboratory assignments in ECE 436 are intended to familiarize the students with the new tools and the design flow. One assignment was also designed to familiarize the students with the use of the test equipment, such as logic analyzers, oscilloscopes, and logic programmers that they use to build and debug their designs. A list of these laboratory assignments is provided here. the memory controller state machine to be used in the 35VEE8 system. Synthesize the gate level implementation using Autologic. Implement that design in a 22V10 programmable array logic (PAL) using the PLDSII tool. 4) Use the logic analyzer and oscilloscope to observe operation and measure timing delays for a test circuit implemented using an Actel FPGA and 22V10 PAL.
C. ECE 436 Project
The project is intended to give the students practical design experience implementing a 35VEE8 computer. The Actel FPGAs were chosen for this project. They offer reasonable speed and enough density to easily fit the entire design into three 84-pin FPGAs. Most designs force the students to consider steps to minimize logic and partition the design into multiple modules while meeting area and pinout constraints. The Actel FPGAs are only one-time programmable. This characteristic is also considered to be a benefit because it forces the students to perform extensive simulations of their designs before committing to hardware.
One assigned goal is to maximize the performance/cost ratio. A list of available parts is provided, and each part is assigned a dollar value. The "cost" of each design is based on the cost of the parts that are used. The goal therefore becomes one of maximizing the rate of instruction execution of the processor while using devices that sum to the lowest cost. One typical tradeoff explored by the students is that of microcoded versus hardwired control.
While the laboratory assignments are completed individually, students working in teams of three or four complete the project. Thus an opportunity is provided for the students to develop their team skills. Since each student completed an individual design for the 35VEE8 in ECE435, they must compare their individual designs and establish a joint design for their group. The students in the group must discuss and compromise on design decisions and work out schemes for information sharing, managing group deadlines, and other similar issues.
There are only two project deadlines to meet. The first of these deadlines is the critical design review. The second deadline is the completion of the working computer.
The critical design review deadline is placed about half way into the course. Each group is required to present their design to a review board consisting of the course instructors and an outside faculty representative who evaluates technical communication skills. The groups are required to develop a presentation that outlines the goals and background of the project, the design that they have developed, the design trade-offs that they investigated, and the overall status of the design. They also must present their overall design methodology and any risk areas that they feel might prevent them from completing the design on time and how they plan to handle these risks. Each student in the group is required to deliver a portion of the presentation. The course instructors attempt to ascertain from the presentation that the group's design is technically correct and that they are on schedule to complete the design. The technical communication representative provides feedback to the students on their presentation and on possible improvements to their group interaction.
The second deadline is the completion of a working computer. The students are responsible for developing programs to debug their processors, but the final demonstration uses a benchmark program. This benchmark is run on each processor and timed to determine the performance of the design. A final written report is also required for the completion of the project.
D. Methodology
The project is intended to provide practical experience in implementation techniques while demonstrating the value of accurate virtual prototyping and teamwork. The initial portion of the design process is performed in a top-down manner. The next step is to implement the required components of their design in the Actel FPGA parts using a minimum number of logic modules while maximizing speed. The components are then wired together to implement the 35VEE8 computer in a bottom-up fashion. Once the design is completed and verified via extensive simulation, the students implement their design by wire wrapping the programmed chips onto a prototyping board. The limitation on the number of available Actel FPGAs and the difficulty in making design changes in a wire wrapped board demands extensive simulation to remove functional bugs before construction starts.
E. Tools
Schematic capture, VHDL entry, and functional simulation are accomplished using the same CAD tools used in ECE435. The Leonardo FPGA synthesis tool is used to synthesize behavioral VHDL descriptions to Actel primitives. The synthesized portions of the design are then combined with the hand-designed schematics in Actel Act1 primitives and then placed and routed using the Actel Designer tool. Typically, VHDL and synthesis is used to design state machines and blocks of combinational logic such as ALUs and decoders, while schematic capture is used to design regular structures such as registers, and to lay out datapaths. However, the use of synthesis in the design of the 35VEE8 is not required, and some groups choose not to use it. Once placement and routing of the complete FPGA is accomplished, timing values can be extracted and back annotated so that timing simulations of the entire design can be accomplished.
F. Construction
The students are given a set of parts along with a protoboard. The maximum number of parts given to each group is given in Table I . One of the PALs along with the RAM and one of the EPROM's must be used to construct the memory subsystem. This memory subsystem is not considered to be part of the 35VEE8 processor itself. The second PAL, the MC68681 UART, the MAX233ACPP line driver, and the 3.86 Mhz crystal must be used for the I/O subsystem. This leaves only the three Actel FPGAs and three of the EPROMs for implementation of the 35VEE8 processor. The three EPROMS can be used to store microcode for those groups that elect to use a microcoded control unit. The FPGAs are used to implement the data paths for the processor and either a microcoded control unit or a hardwired control unit.
The groups that implement their design using a hardwired control unit typically experience greater difficulty trying to partition and fit their designs into the FPGAs. The groups that use a microcoded control unit, in general, do not have the size problems encountered by those using hardwired control. Those using microcode usually encounter a different problem in dealing with the allowed length of their microcode words. The limit of three EPROMs for the microstore requires that the microcode words be no wider than 24-bits. In ECE435, there is no limit on the number of control points or the size of the microstore, and the tendency is to use horizontal coding for ease of design. The 24-bit width limitation forces the groups either to redesign their datapaths to require fewer control points, develop a decoding scheme to encode the control bits into a smaller microword, or both.
G. Functional Testing
Functional testing is performed at almost every step through the prototyping process. The first broad stage of testing is performed using extensive functional simulations without timing values. Functional only simulations do not require the placement and routing tools to be run after a redesign. These permit a fast edit-simulate-check cycle.
Upon completion of functional testing, the design is placed and routed, and timing values are back annotated. This process adds both gate delays and routing delays to the design. The simulations are then repeated with full timing data to verify that the timing requirements are satisfied.
The final stage of testing is accomplished on the physical processor prototype board once all the parts have been programmed and wire wrapped together. The board design is tested using standard test bench instruments. 
H. Results
Thus far, only one group has failed to complete a virtual prototype of their design. Further, over 80% of the groups have been able to get their prototypes to function correctly for a significant portion of the instruction set.
One of the most interesting results from this project is the level to which the virtual prototyping process has been successful. In a typical year, of the groups that have produced working designs, none have had any major functional bugs which required redesign or reprogramming of the FPGAs. Most of the groups have experienced wiring errors and even some microcode mistakes, but all of these problems were resolved in a minimum of time and without change to the physical design. The average time from the beginning of construction to working physical prototype is about four days. Some groups have even managed to build their 35VEE8 from start to finish during the final day before the deadline!
IV. AN EXAMPLE FINAL PROJECT
This section provides a brief overview of a design developed by one group. The data path for this design used both an 8-bit bus and a 16-bit bus connected together through eight bit latches and tristates. The block diagram of the data path is in Fig. 1 . The arrows in the block diagram represent bus drivers (implemented as multiplexors) while all the boxes are registers. Smaller boxes represent eight bit registers and the larger ones represent sixteen bit registers.
The 8-bit portion of the data path contains the 8-bit registers, the arithmetic and logic unit (ALU), and the bus drivers. This group built their own 8-bit registers using the Actel 1-bit register macros rather than using the Actel 8-bit register macro, and they found that their registers were 15% smaller than those produced by the simpler approach. This size reduction was possible because the 8-bit macros included functionality that was not needed for this design. Since this design contained fourteen 8-bit registers, this significant reduction in size helped to fit the design into the individual FPGAs. The group chose to implement the ALU using VHDL code. They then used Autologic to synthesize a gate level ALU design from this code. The bus structure was implemented using multiplexors because the Actel FPGAs do not provide internal tristate devices. The outputs from each register were connected to the inputs of a multiplexor and the multiplexor output was used as the source for a resolved "bus."
The 16-bit portion of the data path is simpler with only two 16-bit registers, six 8-bit registers, an incrementor/ decrementor, and two constant values. Also included in this data path are the multiplexors to implement the 16-bit "bus." Two 16-bit registers are used as the program counter and the memory address register. The 8-bit registers in the 16-bit data path were chosen because they interact with both buses. The D and E register combination can be loaded and used by the 8-bit bus, but their combination forms the 16-bit X register. The high and low 8-bit registers, SPH and SPL, operate in a similar manner with the combination constituting the sixteen bit stack pointer register. The last two 8-bit registers connected in this fashion are the IR1 and the IR2 that hold elements of the fetched instruction. These registers are used to pass data from the 8-bit bus to the 16-bit bus. Separating the registers in this fashion provides easy access from either data path while allowing parallel independent operation of the two separate buses.
This group decided to include a 16-bit incrementor/decrementor in their design to avoid using the 8-bit ALU for these operations. This device permitted the PC to be incremented or the stack pointer incremented or decremented in one clock cycle rather than using the seven clock cycles needed to implement this process using the 8-bit data path and the ALU.
The two constant values available to the 16-bit data path are zero and two. The availability of the zero permits registers to be cleared. The constant two is included because the architecture specifies that the address of the interrupt service routine is found at address two and three.
This group chose to design a microcoded control unit. The control unit consists of five main parts: an execution table, a constant table, a micro address register, an incrementor, and a reset cell. The execution table takes the current instruction op-code as its address and provides a pointer into the microprogram where the microcode corresponding to the op-code resides. The constant table performs the same operation to generate microprogram addresses for the operations of reset, interrupt, and fetch. The micro address register stores the address for the next microprogram instruction. The incrementor increments the microaddress. Finally, the reset cell is combinational logic that resets the microcontroller when the reset pin goes low.
The specifications constrained the microstore to only three 8-bit by 2 K EPROM's. This specification dictated a maximum microcode word width of 24-bits. This group used an encoding scheme to pack the 31-bits needed into the 24-bits available. This scheme designated the highest order bit as a control bit which when set specified microcode for the 8-bit data path and when cleared specified microcode for the 16-bit data path. This coding scheme provided the required control lines. Unfortunately, this design decision cost the group most of the potential for parallel operations within their dual bus structure.
This group ran a full array of functional and timing simulations on their design. They tested any VHDL code before automatic synthesis. After completion of VHDL code testing, the group connected the blocks into the subsystems that would reside in each FPGA. The 8-bit data path was assembled and simulated as a unit, as was the 16-bit data path. Finally, the entire computer was assembled and simulated.
The 8-bit data path and the microcontroller were placed and routed into one FPGA using the Actel tools. The 16-bit data path was placed and routed into another FPGA. After these place and route operations, the models were back annotated with timing delays. The modified simulation model was then evaluated to expose timing and critical path information. The group also wrote several small programs to run in simulation to test the operation of the various processor instructions.
Construction of the physical hardware was started only after the successful evaluation of a complete virtual prototype. The hardware prototype was constructed methodically with incremental testing completed as feasible during construction.
V. CONCLUSION
The two-course sequence based on virtual prototyping was considered to be a success. Most of the students completed the ECE435 individual project and produced a working simulation model of a 35VEE8 computer. Most of these students were then able to take their computer designs and actually implement these designs in FPGA technology in the limited time of ECE436. This design process illustrated the benefits of virtual prototyping and simulation-based designs to the students.
Student opinions of the two-course sequence were solicited. Student opinion of these courses was very good with many students commenting on the great satisfaction that they enjoyed when they were able to run a program on a computer of their own construction. A few commented that they gained a much greater understanding of computer architecture fundamentals through this extended design effort.
The importance of virtual prototyping in reducing design cost and time-to-market while allowing a greater exploration of the design space make it required technology for remaining competitive in today's digital system market. The education of students in simulation-based design is therefore also becoming increasingly important.
