Verification of a computer that implements a new architecture is especially difficult since no approved functional test cases are available. The logic design of the ISM RlSC System/SOOO" was verified mainly by a special1 developed random test program generator (RT PY G), which was used from the early stages of the design until its successful completion. APL was chosen for the RlSC System/SOOO RTPG implementation after considering the suitability of this programming language for modelin computer architectures, the very tight schedu Y e, and the highly changeable environment in which RTPG would operate.
T he ultimate goal of design verification is to ensure equivalence between a design and its functional specification. Strictly speaking, we can say that this goal can be achieved by exhaustive simulation or formal proof of correctness. The exhaustive simulation, in which all possible combinations of all inputs and memory elements of the design should be applied, can be done only for very small designs. Also, the state of the art of the formal techniques and the complexity of designs and specifications, usually written in English, do not allow utilization of the formal techniques in most industrial applications. Despite significant progress achieved in recent years in formal verification, it has been pointed out that formal verification is not intended to replace simulation completely and that simulation is presently the major tool for the (partial) validation of hardware designs.
In practical applications only a relatively small subset (as compared to the exhaustive set) of selected test cases is simulated. The challenge, then, is to create a subset that provides high confidence in the correctness of the design. We discuss how biasing techniques, combined with the dynamic approach to random test program generation, help to solve this problem. This paper describes the concepts behind and the implementation of the IBM RISC System/6000* random test program generator (RTPG) developed to assist in the interactive creation of the adequate subset, as well as to automatically produce a vast Wopyright 1991 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computerbased and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.
number of test programs for the comprehensive verification of the design.
At the moment the design was launched, no functional test cases existed for RISC System/6000 architecture. It was obvious that the traditional way of writing test cases could not provide the required level of confidence in the design. The design verification methodology developed at the IBM Haifa Research Group (HRG)3'4 had already been successfully applied to several smaller designs such as floating-point units and a microcontroller. It was decided to adopt this approach for verification of the FUSC Systed6000 computer system. APL was chosen as the programming language for the RISC Systed6000 RTPG after considering the suitability of this language for modeling computer architectures, the very tight design schedule, and the highly changeable environment in which RTPG would operate. Originally the RISC Systed6000 RTPG was developed in vS APL on the virtual machine (VM) operating system. It was later "migrated" to mu on the same system. It is currently being used for verification of follow-on products and is running mainly in batch mode on a cluster of over 30 RISC Systed6000 machines, using IBM's APwm.
The second section of the paper discusses some aspects of processor verification and describes a test program format suitable for this purpose. The subsequent section presents the main RTPG concepts and ways to realize them. The RTPG structure and the basic operation modes are described in the fourth section. Highlights, conclusions, and results of the RTPG experience are summarized in the last section.
The nature of processor verification
The RISC Systed6OOO RTPG and its predecessors. Logic verification of VLSI (very large-scale integrated) designs has always been an intrinsic part of the design process; however, the complexity of verification grows much faster than the complexity of designs. The problem is widely recognized in the case of microprocessors, as they present the leading edge of single-chip design complexity. Verification of microprocessors is considered to be a bottleneck of the entire design process, with crucial impact on the schedule for delivery of new systems.' An automated approach is essential for verification of a system that consists of several VLSI chips including 528 AHARON ET AL. a processor, a floating-point unit, a storage control unit, and caches.
In most applications, a test case for a processor is a program written in assembly language. The main A random approach to automatic test generation has proved to be successful.
goal of the tool called R T P G ,~ which is actually a dynamic biased pseudo-random test program generator, is to make the test program generation process more productive, comprehensive, and efficient.
A random approach to automatic test generation for software7 and hardware' verification has proved to be successful. It was applied to the verification of selected design units such as a floating-point unitg and even a complete processor,' but very strong restrictions were imposed on the generated test programs.
As a result of those restrictions, many parts of the design could not be accessed and, thus, could not be verified.
For some designs, such as a floating-point unit, the main part of the verification task can be fulfilled by programs that consist of only one instruction. The generation of such programs is relatively straightforward: The generator (or the user) selects an instruction and the required controls, generates the operands randomly (or provides them), and then invokes a reference model of the design to get the expected results. The generation of multiple instruction test programs is much more complicated, especially when such features as program control instructions, interrupts, and address translation are to be verified. There are approaches'.'' that present a way to generate multi-instruction test programs. They are based on creating special tables of operands, and each instruction may select operands only from the relevant tables. Although much more productive than manual test writing, these approaches have drawbacks. The tables used must ensure that the generated test programs are worthwhile, that they would create the required instruction stream, and that they would not quickly end up with an interrupt. These conditions imply use of the utmost caution in creating the tables and make this task very cumbersome. The generated test programs are relatively simple, and again, must obey many restrictions. For example:
Some instructions are always preceded by specially inserted instructions, e.g., for initialization of base registers to get the allowed memory addresses. Thus, some sequences of the instructions can never be generated. For the same reason, a register may not be used as a source for different types of activity, such as an operand in an arithmetic instruction and a base register for addressing. To avoid creating endless loops, only branch forward instructions are generated. In branch conditional instructions, where it is not known a p ion' whether the branch is taken or not, instructions for both possible paths must be gen- the complete Systed360* was formally described in A P L .~~ All Boolean and relational functions are supported, and these functions provide very efficient bit-per-bit execution for bit arrays of any length. The language has the ability to individually address each bit in an array. It is often necessary to work with bit fields and subfields within instruction or data words, including doubleprecision floating-point data which are 64 bits long in the RISC Systed6000. APL allows the needed fields to be easily split out, whereas many languages do not support bit operations at all (and especially not in more than 32-bit words). Because bit operations are so common, RTPG keeps values for all of the instruction and data words in Boolean form. Since APL stores a Boolean value as a single bit in the host processor storage, there is no penalty for keeping data in this convenient form. The APL rotate function together with the selection functions (like take and drop) are natural for implementing bit-shifting operations that are required in any computer processor model and are especially powerful in the IBM RISC Systed6000 architecture. For example, the result of a Shift Right Algebraic Immediate (SRAI) in~truction'~ is calculated by the following concise expression:
GPRCRA; 1+32f(SH/GPRCRS;OI ),GPRCRS; 1 and the Carry bit (CA) is:

XBRC~I~GPRCRS;O~~V/(-SH)~GPRCRS;I
Here SH is the shift amount, RS and RA are the numbers of the general-purpose registers involved in the instruction, and the second bit of XBR (kedpoint exception register) contains CA. A description of the same instruction in any other programming language would be much longer and less easily understood. Note that the description of this instruction in English takes 10 lines in the architecture document.
Finally, although RTPG was initially planned to run on an IBM 3090-type processor (under VM), it was recognized early-on that it would also be necessary to run RTPG on workstation platforms. In fact, RTPG is now running under IBMS APL2/6000 on the very platform that it helped to verify. The transfer of the RTPG APL code from APL2 on the V M operating system to APL2/6000 on the Advanced Interactive Executive* (AIX*) operating system was trivial. The only change required was to the four file I/O programs and to the display screen programs. As mentioned earlier, R P G is now running on a cluster of over 30 IBM RISC System/6000 processors to do the work of verifying new processors under development for the RISC Systerd6000 family of computers.
Test programs for processor verification.
The most natural way of processor verification is to run assembly programs through the design model and to compare the simulated results with the expected ones. Usually the test programs are written as selfchecking programs that return only a "go/no-go" flag. This concept is simple; however, its usage faces difficulties since:
It can be used only when the design model is at an advanced stage, or at least when load and compare instructions are implemented. The test programs should obey the restrictions imposed by the supervisor that runs them. It requires more simulation cycles (running time) because of additional load and compare instructions that are simulated. The collection of these three parts is referred to as a testprogrum in this paper. Such test programs are self-contained: they include all information required for their independent and completely predictable execution. This feature enables them to freely migrate between test libraries and to be executed in any order.
A small test program generated by RTPG for the IBM RISC System/6000 processor is shown in Figure 1 .
As usual, asterisk "cards" (or lines) are used for comments. Comments may also be included in any line after the required data. """"""""""-Initialization """"""""""""-
RISC System/6000 RTPG -----* --* ----------t """"""""""- Dynamic test generation. The generation stage starts by inserting the operation code into the instruction word and establishing the rest of the instruction fields. At any point in the process each facility may be either free, which means that no value has been assigned to it yet, or have a value, in which case one is assigned to it by the initialization part of the process or by the execution of previous instructions. All facilities that influence the execution of the generated instruction are inspected, and those that are free are initialized. This principle works regardless of the complexity of a generated instruction and the initialization that it requires. For example, a single store instruction, besides initialization of data, base, and offset registers, may require initialization of an entire address translation path.
As soon as the instruction and all of the associated facilities are defined, the instruction is executed and all facilities that are changed during the exe- Select an instruction and its fields to gain the best effect from the execution (various biasing strategies help to achieve this goal) Bias data and operands, depending on the instruction being generated Reject the instruction if its execution would render the test program invalid Include any number of branch instructions in the test Control eventual interrupts Protect the required areas of memory and certain registers from being used by the generated test Define, on the fly, all required entries in the system tables (such as the page frame table)
A trace of the most recently updated facilities is also available and is used for implementation of some useful RTPG options.
Biasing. The generation of test programs is biased in order to increase the probable occurrence of events that otherwise have very low chances of being created. Biasing is both the strong point and the vulnerable point of R P G . Strong, because it allows the generation of test programs with the required features. Vulnerable, because the selection of biasing strategies cannot be completely formalized and depends on the experience of the RTPG developer and that person's knowledge of the design. The goal of the biasing is not the creation of unique or very rare situations, but rather is to direct the generation process toward selected design areas so that most of the events in these areas are tested when the number of generated test programs is reasonably large.
The biasing functions are employed in the process of selecting instructions, instruction fields, registers, addresses, data, and other components that construct the test program. The starting set of the R P G biasing strategies is derived from the architecture. Each instruction or process (such as interrupt action or address translation) specified in the architecture is represented by a block diagram composed of decision and execution blocks. In every decision block the data affecting the decision are selected in such a way that the subsequent blocks are entered with user-specified or RTPG-controlled 532 AHARON ET AL.
probability. However, in multi-instruction test programs it is not always possible to get the required data. Let us say that the user asked for a 10 percent probability of floating-point overflow, and in the current instruction the decision was made to create it. If all floating-point registers already have values assigned by previous instructions, it may happen that no pair of registers will produce an overflow. Thus, in the generated test cases the actual probability of overflow might be less than the requested one.
To some extent RPG is a system that gathers into its biasing strategies all of the experience gained during the verification of several processor designs.
RTPG gathers into its biasing strategies all of the experience gained during verification of several processor designs.
As an example, consider the strategy of register selection, which is very important, especially for a RISC-type architecture where a large number of general-purpose registers is available and as many as three or four registers may be used in one instruction. The RISC Systend6OOO RTPG allows the selection of any one of the following three strategies:
1. Selection of free registers only. Here RTPG has complete freedom in biasing the instruction operands. Also, the result of each instruction will never be overwritten by the actions of subsequent instructions. Only relatively short test programs can be generated when this option is chosen.
Random selection (the default strategy). A reg-
ister is selected randomly with biasing toward: Increasing the probability to use the same register more than once in an instruction. Preventing usage of a register as a target if it has been a target during its previous usage. This feature increases the test program observability, i.e., the probability to propagate any intermediate errors to the observable expected results. The starting set of biasing strategies is revised based on test coverage analysis of the generated test programs on both the RTPG reference model and the design model^.^ Coverage evaluation helps to detect and remove "holes" in the biasing. The final set of strategies ensures that there is a generation process with a reasonable probability of covering every architectural feature and every design block.
RTPG supports two biasing levels: local and global. The local biasing is involved in selecting immediate fields of instructions and selecting data for the operands. Many local biasing functions, specific for every class of instruction, are implemented in R P G . For example, the generation of operands for add class instructions ensures a high probability of getting long chains of carries. In a "count leading zeros" instruction the biasing ensures the creation of operands with equal probabilities for any num-1 ber of leading zeros. Some more sophisticated local biasing functions are implemented in more complicated cases, e.g., floating-point instructions.
Examples of global biasing control parameters that have a primary effect on the generation process are:
Instruction selection strategy Initial value of the machine state register (MSR) Strategy for selecting general-purpose registers Memory areas that the test program is allowed to use Each global and most of the local biasing strategies may be specified by the user. They are selected randomly by RPG if not provided.
RTPG structure and basic operation modes
Environment. The RTPG design verification environment is shown in Figure 2 . The integuce and biasing blocks are actually parts of RTPG but are shown separately because of their connections to the external world. RTPG The architecture specification document is the primary source of information for the R P G developers, and most of its features are embodied in RTPG. RTPG has to know all of the instruction format, and for each instruction, all of the parameters that influence its execution. This information is required for generating the instruction fields and for checking that all necessary facilities were defined before the execution of the instruction. R P G has to keep a record of all facilities changed during instruction execution. The final state of all of the changed facilities provides the expected results.
The architecture may leave the handling of certain situations to the implementation. For example, unaligned storage access may cause an alignment interrupt in some implementations and may be handled by hardware in others. All such situations are handled in RPG so that the test program that is created is correct for the implementation being tested.
AHARON ET AL
User interface. The RTPG user intefluce includes several screens that allow the user to define the initial state of the processor and to control the test program generation process. The main screen used to generate the test program of Figure 1 is shown in Figure 3 .
All screen parameters are optional, and if not specified, the default values are used (e.g., the default for InsList contains all instructions). The interface provides a way for documenting the generated test programs, selecting biasing strategies, initializing instruction fields, registers, and memory, and executing existing test programs.
The user may initialize any of the required facilities within an X-Init file. The file has the usual test program format. Thus, a prototype of a test program may first be created by RPG and thereafter used for initialization and generation of many new test programs on top of the prototype. The X-Init file may also contain blocks of instructions and data that become parts of the test program. This feature is useful for including interrupt handler routines in the generated test programs.
The Debug Mode is a powerful by-product of the dynamic nature of R P G . Including it in RTPG is almost free since RTPG already scans all changed facilities after every executed instruction. When RTPG offers two modes of operation.
this option is employed, the expected results of each intermediate instruction are included in the generated test program. This option is very useful in locating problems when a test fails. However, it requires much more space for storing the test programs.
R P G offers two modes of operation: the generation mode (Gen) and the execution mode (Run). In the first mode, RTPG is used to generate one test file per invocation. The file contains the requested number of test programs, each program generated according to the control parameters specified on the screens. The batch version of the Gen mode is used for mass production where a large number of test programs is created for predefined sets of initial conditions. Such generation is performed as overnight runs or during weekends. Since the "porting" of RTPG to the IBM RISC System/6000 platform, RTPG runs as a background process concurrently on many RISC System/6000 machines connected in a local area network. The programs generated on the workstations as well as the programs generated on the VM host machines are submitted automatically to various simulators connected to the same local area network. Tests that do not expose any design problems are discarded. Also, the coverage analysis on the architecture level is used in the preparation of a relatively small subset of tests that include test programs for every architectural feature. In the RISC System/6000 RTPG the high-level reference model was implemented within RTPG. The interpretive nature of APL considerably facilitated the implementation of some coverage analysis techniques, including techniques that require fault injection.
One of the simplest coverage techniques is ensuring that each line of the code has been executed at least once. A special function analyzes the character representation of an APL function and splits each labeled line into two lines. The first one contains the label and an assignment statement that sets the corresponding bit in a trace vector associated with this function. The second line created by the split contains the APL statement that was on the line before the split (but without the label). Assignments of the bits of the trace vector are also inserted after each statement with an APL right arrow (branch). A bucket of test programs is then executed (using the Run mode) on the "trace-modified" RPG. Zero values in the trace vector indicate blocks that were not reached.
Another coverage technique called "skip mutation,"14 which requires injection of faults into the code and thus provides much higher confidence in the generated test programs, may also be easily implemented. Skip mutation means that one line of the analyzed function is not executed. To make this technique more sensitive, an original APL function may be replaced by its more detailed version. Skip mutation is performed by another function that takes the character representation of an APL function to be checked and precedes the required line by the comment symbol "R". The information from the previous step (line coverage) is used to select only those test programs that pass through the skipped line. The procedure is repeated for each line in the function.
The coverage analysis is done automatically as soon as both the function to be analyzed and the test file are specified. However, because of performance considerations, only functions for which a low coverage is suspected are analyzed. In the RISC System/6000 RTPG environment, only functions that implement the floating-point unit were analyzed by both techniques.
RTPG implementation.
The RISC Systed6000 RTPG is implemented as a single workspace which is able to create test programs of up to several thousand instructions on a six-to eight-megabyte virtual machine. The user interface was written in REXX to simplify some Conversational Monitor System (CMS) file-related checking that was not so easy to implement in VS APL. The RTPG functions may be grouped into:
Utility and service functions Functions for instruction execution Biasing functions Simulation of interrupts Address translation
The service functions prepare the initial machine state, manage instruction selection, prevent the creation of endless loops in the generated test programs, and mask undefined results. These functions also handle the Run option, including comparing the actual results with the expected ones that are provided in the original test program.
Each FUSC Systed6000 instruction has a corresponding APL function with the same name that operates on the architectural facilities (defined as global variables) to perform the behavior of the instruction. Each "instruction function" is partitioned into a biasing section and an execution section which are used as required for biasing and setup or reference model operation. As soon as the instruction to be generated is selected, the required APL function is invoked by "executing" the character representation of the instruction mnemonic. Thus, when a new instruction is added to the system, or in case of a change in the instruction mnemonic or behavior, only one function must be added or changed. A rich set of utility functions that perform many of the required common tasks, such as incrementing the instruction pointer or adding 536 AHARON ET AL.
two register values, facilitates writing of the "instruction" functions.
The register arrays of the processor are modeled as APL Boolean matrices. The memory is modeled as
The RlSC System16000 RTPG is implemented as a single workspace.
another Boolean matrix with a companion address vector that maps processor memory addresses to APL matrix indices.
When the project was started, very little reusable RTPG software existed from previous projects (ah though the concepts were well understood). At any moment no more than four persons were working on RPG. One of them was dedicated to the user interface written in REXX, and one was involved only part time in R P G development. In less than four months the first version of RTPG, which supported almost all branch and fixed-point instructions, was given to the designers. Floating-point instructions were delivered a month later. From that time only two people on average were involved in RTPG development, working on the storage control unit, on cache modeling, on imbedding architecture changes, and on implementation-dependent features. They also supported RTPG in the field, responding to numerous designers' requests. This activity was completed exactly one year after starting the project, and since then, only one person is involved in RTPG support and enhancements. This person's responsibility includes porting RTPG to APL2/6000 and upgrading it to support follow-on designs.
Concluding remarks
We described a comprehensive procedure for biased random test program generation and the RTPG implementation of the approach. This approach has been adopted as a main technique in the design verification process of several IBM designs. The designs varied from floating-point coprocessors to the complete complex of the IBM RISC Systed6000 computer. In all cases the corresponding VLSI products came out fully functional on the first pass. In the case of the RISC Systend6000, once the final design was completed, no new bugs were found.
RTPG accompanies the design process from its very early stages through its successful completion. At the beginning of the process, RTPG is used by the designers to generate simple test programs directed toward recently developed logic. This use results in a significantly lower error detection rate at the advanced stages of the design. At the system level, RTPG is used mainly in batch mode, where a significant volume of test programs, some of them consisting of up to several thousand instructions, is generated and simulated on the design model. The employed biasing strategies have evolved, based on requests coming from the designers and the feedback from the coverage analysis.
The RTPG development effort is not considered to be negligible. However, it is incomparable with the amount of resources required to achieve similar verification quality with manually written or purely randomly generated test programs. At the moment, RTPG is notably tailored to the architecture it serves. Nevertheless, existing RTPG provide a good groundwork for the development of new ones, even when the architectures are different.
Choosing APL for the implementation of RTPG allowed us to provide this tool to the designers on a timely basis, and it also allowed us to keep up with many changes and modifications to RTPG necessitated by the novelty of the approach and also by frequent changes in the architecture at that time. Shoulder-to-shoulder work with the designers contributed to the success of the tool but required instant response to their requests. Again, APL, with no compilation and linkage overhead, allowed us to quickly respond to these requests. The interpretive nature of APL was used to implement some test coverage evaluation techniques that work on the architecture level directly in RTPG.
In the beginning, RTPG was used mainly in the interactive mode. With a capability to generate 20 to 30 instructions per second, it provided fairly good response time to the users. The design model simulator running on the VM host machine was slower. Porting of the simulator to the RISC System/6000 platform and the use of hardware assist for the simulation required more and more test programs to feed all available simulators. Moving RTPG to
IBM SYSTEMS JOURNAL, VOL 30, NO 4, 1991
APL2/6000 solved the problem of limited available computer time, and now RTPG, running on a cluster of RISC System/6000 machines, is able to produce the required number of test programs.
