simulator that can handle various design errorlfault models is presented. The simulator is a vital building block of a new promising method of high-level testing and design validation that aims at explicit design errorlfault modeling, design error simulation, and modeldirected test pattern generation. We first describe how signals are represented in our concurrent fault simulation and the method of performing operations on these signals. We then describe how to handle the challenges in executing conditional statements when the signals used by the statements are augmented by an errorlfaurt list. We further describe the method in which the error models are embedded into the simulator such that the result of a concurrent simulation matches that of a sequence of HDL simulations with the set of errorslfaults inserted manually one by one. We finally demonstrate the application of our concurrent design error simulator on a typical Motorola microprocessor. Our simulator was able to detect all detectable and modeled design errorslfaults for a given test sequence and was able to reveal valuable information about the behavior o f erroneous designs.
INTRODUCTION
Modern microprocessor implementations have become dependent on intricate instruction flow techniques and functional units to promote peak instruction throughput. These implementations, however critical for performance, result in processors that are more difficult to validate in their preliminary design stages, and are more difficult to test after fabrication. As a result of the increase in complexity of modern microprocessor implementations, extensive analysis needs to be done at Hussain Al-Asaad University of California One Shields Ave Davis, CA 95616 halasaad@ece.ucdavis.edu the preliminary design stages to expose design errors and testing complications prior to design layout and fabrication. Given that preliminary microarchitecture implementations are developed under a high-level hardware descripfion language (HDL) such as VHDL or Verilog HDL, it has become important to develop the tools that are capable of effectively validating and analyzing the testability of these high-level implementations.
Mutation-based validation techniques attempt to circumvent the complexity problem introduced by exploring any coverage measure exhaustively by using models for design errors as guidance. Error modeling is used to create an artificial coltection of simple design errors that span throughout the corner cases of an implementation. As a consequence to the coupling effect between simple and complex design errors [l] , a test sequence that is capable of detecting these known simple errors is implicitly capable of detecting complex design errors as well. Therefore, one application for our concurrent error model simulator is to grade a test sequence's ability to traverse the design space by concurrently and efficiently applying it to the complete set of simple design errors and reporting its coverage. The design error model used for this paper is described later in the paper.
One other promising application of mutationbased circuit simulation is that of mutation-based testing. Testing efforts require a coverage meas- ure that is capable of affecting the maximal set of possible physical fault sites. Once this is defined, an error model can be designed to span the complete coverage measure. These error models, known as physical fault models, can be used in conjunction with our concurrent error model simulator to grade a test sequence's ability to detect possible physical faults and to give an architect valuable statistics on his implementation.
It is the goal of this paper to demonstrate that concurrent simulation of a large set of modeled design errors can be performed on a synthesizable HDL design. The rest of the paper is organized as follows. We first describe related work and then describe how signals are implemented and how operations on signals are executed to support our concurrent design error simulation. Then we introduce the problem encountered when propagating fault lists across condition statements, and we present a technique for handling conditional statements. We then briefly discuss the method in which the simulator is orchestrated, and we discuss the method in which the error models are embedded into the simulation such that concurrent error simutation generates identical results to sequential error simulation. Finally, we discuss how error models can be used for system validation and present the results obtained from our simulation experiments. The notion of the affected signal threshold and its relevance to our research is also discussed.
RELATED WORK
Analysis of controllability and observability measures through concurrent simulation methods has been previously investigated via a tag simulation calculus [Z] . Under this simulation method, a single tag is propagated throughout the simulation to designate a possible change in a signal value due to an error. This method, however, results in an estimation of observability given that mutations on a signal are represented by a A tag that only represents a positive or negative polarity. Furthermore, this method requires the modification of the hardware description when condition statements are involved in, order to compute the effects of the fault model when it causes the wrong path to be taken. The methods in [3] 
FAULT-LI ST E NAB LED SIGNALS
The initial step in developing our high-level concurrent error simulator consists of determining how a signal should maintain its fault list, and how the basic signal operations should be performed on the complete fault lists. To accomplish this. a signal is first defined as an object that consists of a fault-free value along with a list of mutant values, where each mutant m in the signal S is a result of t h e corresponding parent error model. We denote the parent error model of a mutant value m by n(m) such that Vmi in the signal S : n(S) = vi .(mi). It is common that aliasing occurs between the fault-free value and one or more mutant values, in which case it is advantageous to collapse the error lists as a means to reduce the memory demand and the number of operations required by each list.
Our simulator takes as input a collection of error models E which are used to generate and insert a mutant into a specific fault site when appropriate. Let ai be the set of fault values in signal A, such that ai=o denotes the fault-free value and aizO denotes the mutant value associated with the error model n(ai) that has an ID value i. Let aob denote an arbitrary operation on two signal values, and let AoB denote the same arbitrary operation performed over all signal values ai and bi in signals A and B, respectively, such that an operation aiobj,ii is not allowed because an operation cannot be performed across design error models. In the case where ll(A)#lT(B), a request for an implicit (non-existent) mutant value aid results in the generation of the requested value from the fault-free value. We will denote the generation process by ski. In the rest of this paper, a value generated from the fault-free value is referred to as an implicit value and a value extracted directly from the fault list is referred to as an explicit value.
There is no distinction between an aliased mutant value and a mutant value corresponding to an error model that has not been activated, therefore we can assume that any mutant value not present in a fault list has been aliased, and it is correct to generate the corresponding mutant value from the implied fault-free value upon demand. This allows us to perform an operation across two fault lists that don't contain mutant values from the exact set of error models, and we can describe this operation by the following equation:
To illustrate the above equation, let us consider the example where A={ao, a3, as} and B={bo, b4, b5}. The operation Z=AoB is decomposed into the set of sub-operations {zo=aoobo, z3=a30bOA3, q=aoAob4, z5=as0b5} as depicted in Figure 1 . Furthermore, if the value generated by the operation a50b5 is aliased by the value generated by the operation aoobo, then the resulting set of values in signal Z will be Z={zo, z3. a} after fault collapsing.
We next describe the basic operations on fault lists.
INSERT-MUTANT (L. m):
Inserts the mutant m into the fault-list L while preserving fault-collapsing and L's ordering of increasing mutant ID. Each fault list is implemented by a linked list of.mutant values, and is referenced 
VHDL Condition Statement
The next important step in the development of our concurrent error simulator for high-level hardware descriptions required the conceptualization of a method to implement conditional execution on signals containing a fault list. The problem of executing a statement based on a fault list enabled condition is that the condition will be met by some of the error models and not by others. As a result, the fault list of the signals in the condition statement must be split into two partitions: the set of error models that meet the condition, and the set of error models that do not. When executing a condition statement, the following actions need to be performed by the simulator:
i) The condition needs to be evaluated using comparison operators as described earlier, resulting in the creation of a Boolean fault list. ii) All the signals used within the condition statement need to be initialized via partitioning such that the target partition for each fault-list item is specified in the condition fault list. iii)The TRUE partition of each signal is used within the then portion of the condition state- mutant value B, from B's FALSE partition during the recombination phase in step (iv). A similar operation occurs when performing the recombination process on the signal Z such that Z, is generated from the fault-free value of the FALSE partition. In this situation however, Z, is collapsed as it is inserted into the fault list due to redundancy with the fault-free value 20. It is important to note that the TRUE and FALSE partitions exist as signal instantiations themselves, thus nested condition statements are handled in a nested fashion.
ORCHESTRATING THE SIMULATOR
The techniques mentioned in the previous sections are 'first developed and validated with small code segments and later with a high-level microprocessor implementation. We have decided to manually construct an internal representation of a high-level microprocessor implementation using the aforementioned techniques to obtain our simulation results. Our goal was not to produce a complete simulation environment, but to produce the basic tools that allow us to explore the techniques required in performing a correct concurrent simulation of a set of error models.
When taking the concepts learned from the previous .two sections, it becomes obvious that the concurrent error model simulator needs to execute both paths of each condition statement in order to update both the TRUE and FALSE parti- i) VHDL statements are imported to C++ using overloaded operators within the signal class. ii) Access of sub-vectors in the VHDL syntax is imported using the bitvector-signal class. iii) VHDL condition statement handlers are imported to execute TRUE and FALSE partitions. iv) Placing each process in the hardware description into a module construct object that handles the signaf initialization, process execution, and signal propagation tasks. v) A netlist is a set of module constructs.
Given that the simulation granularity of this preliminary simulator is the same as the VHDL process level, a significant number of statements in a process are being fired unnecessarily because the sensitivity list in a process is used to fire every statement in that process. This limitation is of no concern at this time because the goal of this research is to develop and justify concurrent error simulation techniques. The correctness of the concurrent simulation techniques presented in this paper have been validated by comparing the results of simulating numerous error models under our concurrent error simulator with the corresponding set of sequential error simulations obtained by the Synopsys CAD tools.
MUTANT VALUE GENERATION
It should be clear at this point that the core concurrent error-model simulator does not produce mutant values; its purpose is simply to propagate them. The mutant values are generated by separate engine(s) we call mutant value generafor(s).
This results in a simulation environment that is adaptable to any design-basedlfault-based error models by creating the appropriate error model generator(s) that are in charge of inserting the appropriate mutant values into the .appropriate signal(s) under the appropriate condition(s).
In order to conjecture on the methods of generating mutant values, let us take a feedback circuit into account. When an error model is first activated in the circuit, it generates a mutant value that might feed back to the same activation site to re-activate the error model. At this point, it generates a mutant value from an already mutant signal. As a result, a mutation generator is activated by signals where its corresponding mutant value is given higher preference over the fault-free value. That is to say that the mutation generator uses a signal's fault-free value if and only if a mutant value of corresponding ID tag does not exist. Furthermore, any mutant values that are inserted into a signal will replace the previous corresponding mutant value if it exists.
USING DESIGN ERROR MODELS FOR SYSTEM VALIDATION
Based on previous work in [a, a mutation control error (MCE) has been defined as the quintuplet (i, c, s, vc, ve) such that i is the current instruction, c is the cycle in the processor pipeline, s is the control signal that will experience the mutant signal, vc is the correct value of the control signal, and ve is the erroneous value that will be inserted into the fault list of the control signal s. The above definition can be applied directly to a structural description of a microprocessor where the instruction is deciphered by the hardware description and the processor cycle is obtainable from the implementation. Unfortunately, not all hardware descriptions have implementations with explicit instruction and processor cycles, such as with microprocessor implementations based on a finite state machine (FSM). Under this situation, the structural MCE design error model needs to be adapted into the quadruplet (s, c, vc, ve) such that s corresponds to the explicit processor state, c is the control signal that will experience the mutant signal, vc is the correct value of the control signal. and ve is the erroneous value that will be inserted into the fault list of the control signal s. This modification is possible because the combination of the instructian i and the processor cycle c of a structural microprocessor represent the processor state.
We have used the modified MCE model to implement an automatic design error generator for the FSM-based implementation of the Motoro\a 6800 microprocessor by John E. Kent [opencores.org]. The exhaustive set of MCEs for this implementation consists of 300,092 errors, and the four distinct simulation runs described in Figure 3 were performed for observation purposes.
The first two simulation runs were performed by only labeling the primary outputs (POs) as the observation points, and a second pair of simulations were later performed by adding the ac- 
Observation Points

Primary outputs
Errors cumulator registers A and B (acca, accb), the stack pointer register (SP) and the program counter register (PC) into the set of observation points to determine the effectiveness of increasing this implementation's observability. Figure 4 graphs. the total number of design errors detected across each simulation run. It is interesting to notice that increasing the observability did not result in a significantly greater number of design errors being detected. Furthermore, data set 1 demonstrates how the first simulation begins an unproductive simulation path at around test vector number I100 but reaches a highly productive state sequence at test vector 1928 that allows it to almost reach the performance results of the second simulation. This sudden change in productivity along with the sudden peak in data set 1 of Figure 6a demonstrate the possibility of achieving a higher detection rate if a test sequence is generated that maintain a high error model activation count. Figure 5 graphs the number of design errors dropped from the simulation after a percentage of internal signals are affected; this percentage is being denoted as TAs (affected signal threshold).
From the graph, we can see that the most common threshold occurs at IO%, letting us know that most of the error models were dropped after they affected 10% of the internal signals. The driving concept behind the affected signal threshold reties on the fact that as the number of internal signals experiencing the effect of specific design error increases, the probability that a primary output is also affected will also increase. Therefore, it is expected that design errors will reach a high probability of being dropped from the simulation after affecting a threshold of internal signals (TAs).
Thus naturally, if the hardware description is optimized by observability measures in such a way that the TAs level .is substantially low, then it is expected that an improved number of error models will be detected per test sequence. Further- more, fault-dropping plays a larger role on a design with a low TAs level, as it confines the number of signals that the average error model affects before that error model is dropped. Figure 6a corresponds to two simulations labeled data sets 1 and 2. It is clear that the second simulation run affected a lower average number of design errors, but had more distributed peaks that helped it detect a larger number of design errors.
Figure 6b corresponds to the two simulations labeled as data sets 3 and 4. The development of the third data set in the form of a step function might lead one to assume that the extra observation points have resulted in the sudden drops in active error counts as error models are removed from the simulation. This anomaly, however, does not correspond to the extra observation points and must be disregarded. To prove this, we must compare data between Figures 4 and 6b to notice that each sudden drop in data set 3 of Figure 6b does not have a corresponding steep incline in data set 3 of Figure 4 . Instead, we must observe that each sudden incline of a curve in Figure 4 corresponds to a sudden peak in the corresponding graph of Figure 6 . The previous graphs bring to our attention the difficulty in generating tests to detect design errors given that an approximate average of 100 out of a collection of 300,092 (less than 0.1%) design errors are active at any point in the simulation, and a peak of 525 design errors are active (0.175%).
On a positive note, the low activation rate of design errors serves to encourage the implementation of a concurrent design error simulator for the validation technique introduced earlier because an exhaustive set of error models can be simulated with an acceptable performance cost.
CONCLUSIONS
In this paper we have introduced a method of simulating mutation-based modeled design errors on high-level microprocessor implementations. Furthermore, we have discussed the challenges of concurrent error model simulation in the presence of condition statements and we have presented an effective way of handling them. Finally, this paper has demonstrated the practicality of our simulation technique and demonstrated that an modeled error has a high probability of being dropped after affecting a threshold of the internal signals. Furthermore, we have provided a versatile simulation system capable of concurrently simulating distinct error model types ranging from design error models geared towards system validation to fault models geared towards controllability and observability analysis or post-silicon system testing.
