Abstract. Research on practical design verification techniques has long been impeded by the lack of published and yet detailed error data. Over the last few years we have systematically collected design error data from a number of academic microprocessor design projects. We present an analysis of this data and report on the lessons learned in the collection effort. These considerations led us to turn to the university as a source of error data on microprocessor design. In what follows, we first report on the modest amount of error data published by industry. We then describe our method to systematically collect design error data from academic design projects. After presenting and analyzing the data we collected, we offer some advice on the collection process based on lessons painfully learned.
Collection and Analysis of Microprocessor Design Errors
With transistor budgets ever expanding, microprocessor architects are steadily integrating new and more sophisticated mechanisms into their designs to boost performance. To cope with this increase in complexity, successful processor verification efforts must employ a variety of complementary verification technologies to achieve an acceptable level of functional correctness in the final product. Research on practical verification techniques for microprocessors has long been impeded by the lack of published error data, despite the abundance of design errors in large-scale projects. It is common practice in industry to record design errors, but this information is considered proprietary and, perhaps, embarrassing, so it rarely appears in public. Detailed error data is especially valuable to verification approaches that use error models to direct test generation [Abad88, VC98] .
Furthermore, sets of designs and corresponding errors can serve as benchmarks to compare different verification methods. Finally, statistical reliability analysis methods rely heavily on this type of data [Malk98] .
These considerations led us to turn to the university as a source of error data on microprocessor design. In what follows, we first report on the modest amount of error data published by industry. We then describe our method to systematically collect design error data from academic design projects. After presenting and analyzing the data we collected, we offer some advice on the collection process based on lessons painfully learned.
Industrial Error Data
Although design errors that make their way into final products are common, microprocessor manufacturers have not always been forthcoming about them. This has changed since MIPS began to publish their bug list, beginning with [MIP94], The notorious Pentium FDIV bug [Beiz95] also influenced this change. To give a feel for these errors, we discuss a few examples of design errors that have appeared in major commercial microprocessors and the data published about them.
The errata list for the MIPS R4000PC and R4000SC microprocessors (revisions prior to revision 3.0) [MIP94] documents 55 bugs. Many of these require a rare combination of events before they become visible. The following is a representative bug:
If an instruction sequence which contains a load causing a data cache miss is followed by a jump, and the jump instruction is the last instruction on the page and, further, the delay slot of the jump is not mapped at the time, then the (VM) exception vector is incorrectly overwritten by the jump address. The R4000 will use the jump address as the exception vector. The workaround suggested in [MIP94] is to ensure that jump instructions can never be stored in the last location of a page.
Early versions of the Intel 8086 were shipped with the following bug [Ham94] :
The architecture specifies that for MOV and POP instructions to a segment register, interrupts are not to be sampled until completion of the following instruction [Int89] . This feature allows a 32-bit pointer to be loaded to the stack pointer registers SS and SP without the danger of an interrupt occurring between the two loads. However, early versions of the 8086 do not disable interrupts following a MOV to a segment register. This causes them to crash when an interrupt uses the stack between MOV SS, reg and MOV SP, op . A workaround is to insert instructions to temporarily disable the interrupts when reloading SS . An uncorrectable problem occurs when an unmaskable interrupt takes place while executing the instruction pair.
These published bug lists are inadequate for error model construction for two reasons: 1) The errata lists typically provide only a programmer's view on the errors. Error models depend on the design implementation. Therefore, more detailed information about the errors is required, namely the concrete modification to the implementation that fixes 
Collection method
The most suitable point to collect design error data is immediately after the design error is discovered and corrected. At that point, all relevant information about the design error should be recorded. This record-keeping requirement conflicts with the interests of the designer, however. Overhead has to be reduced to a minimum in order to overcome designers' natural reluctance to cooperate.
Our error collection method uses the revision management program CVS [Cede93] . This tool supports the archiving of successive revisions to a design as they are created in a hardware design language (HDL) such as Verilog or VHDL. The designers were asked to submit a new revision of their design to CVS whenever a design error was corrected and whenever they interrupted work on the design. Some designers resisted the error collection process because they saw it as a way the quality of their work could be monitored. We defused this potential problem by providing designers with a handout explaining the use of the revision management system, and by explaining our objectives to obtain the designers' cooperation.
Our first tentative design error collection effort took place during the summer of 1996 and involved a few students engaged in the design of the PUMA research microprocessor. Only the bare revision management system was then in place. Experience with that project motivated the introduction of the handout mentioned above. It was clear that a standardized form was needed to accompany each revision so that interesting revisions, i.e., those involving a design error correction, can be separated from other revisions. We therefore augmented the revision management system so that each time a new revision is submitted, the user is prompted to fill out a questionnaire. The questionnaire, in the form of a multiple choice form shown in Figure 2 , gathers four key pieces of information: 1) the motivation for revising the design. In the case of a bug, the following apply as well: 2) the method by which the bug was detected, 3) the class to which the bug belongs, 4) a short description of the bug. Design errors can be detected by reading the HDL code specifying the design (inspection), by syntax checking performed by the HDL simulator (compilation) or a synthesis tool (synthesis), or by logic simulation.
The operation of our error collection method within the design cycle is illustrated in Figure 3 .
From the raw revision management data, we identified the design modifications to fix each error by computing the differences between successive revisions. The analysis of the design error data lead to a preliminary classification of design errors. This classification was used in our first major design error collection effort, which took place in the fall term of 1996 and involved a design project included in a computer architecture course. Analysis of this design error data lead us to revise our error classification scheme.
The result is shown in Figure 2 . The categories are not completely disjoint, so designers were asked to check all applicable categories.
Data collected
Design error data was collected from both class design projects and research projects at the University of Michigan. All of the designs were described in Verilog. Sample project. In this section we examine in detail the data obtained from design project X86. This was one of the latest projects from which we collected error data, and hence it benefited the most from past experience. Table 2 lists the design files created in the course of the project. For each file, we list its size, the total number of revisions it underwent, and the number of design bugs recorded, broken down by detection method. Note that no synthesis tools were used in this particular project, hence no errors were detected this way. Errors of interest are those detected by inspection or simulation. The designers were aware that syntax errors are of little value to our work. We can therefore assume that many syntax errors were corrected without recording a new design revision, and hence do not appear in the table under the column "compilation." Figure 4 shows the difference between a design revision and the previous revision motivated by an error involving the X86's notoriously complex decoder logic. In revision 1.49, NOR gate Controls_NOPsel_nor2 misses input Stallin . Revision 1.50 corrects this error. of the structural complexity of the error. Although easy to compute, this metric is far from ideal. It does not distinguish between lines of code that have merely been reformatted and is an inversion error on a port connection of a module instance that is repeated for all instances of the module. We define the multiplicity of an actual error as the number of identical and repeated instances of a simpler error that constitute the actual error. Figure 8 plots the frequency of design errors when binned according to size and multiplicity. We observe that design errors of higher multiplicity are rare. Errors of multiplicity 1 and size 1 
Guidelines for error collection
Revision management. A revision management system like CVS has proven to be an invaluable aid in design error collection. Not only did it allow detailed analysis of concrete errors, but it eventually came to be appreciated by the designers. Nevertheless, a few designers see the revision management system as a surreptitious way to monitor their work. Such reservations can usually be overcome by fully explaining the intent of the management system and the benefits accruing from its use. The bug stigma. A key factor in success is to remove the stigma usually associated with design errors. The participation of students from class projects in the error collection effort was on an entirely voluntary basis. We made an effort to make the participating students feel engaged with our research project, and carefully explained to them the value of collecting error data.
Designer overhead. The need to minimize the overhead of error logging for the designer cannot be underestimated. Although the designer is, in principle, in the best position to classify each newly discovered error, this small effort, from which the designer may not see any immediate benefit, may be felt as burden or threat. Consequently, the designation of errors often becomes imprecise. We observed that for long periods some designers marked all of their bugs as conceptual error , even if the actual error involved a single inversion error. This led us to reassess the raw revision data, and explains the discrepancies between the data reported here and that in our earlier work [VC98] . The reassessment also corrected the counts assigned to errors that spanned multiple design files. Previously, these errors had been overrepresented. This adjustment primarily affects the bigger designs where such errors occurred more often. Provided that a new design revision has been systematically recorded for each detected error, the task of classifying the errors with respect to their structural aspects (item 3 of our questionnaire) can be performed by engineers other than the original designers, perhaps with the help of automation.
Other considerations. Finally some additional practical considerations need to be pointed out. Fixing a single design error may require multiple modify / simulate cycles, and hence multiple revisions. The designer should record information to distinguish such revisions.
Fixing a single design error may require modifications to multiple files. Designers should submit new revisions for all of these files together. Otherwise, the revisions data can wrongly be interpreted as concerning multiple errors. Our design error collection effort has several inherent limitations, so care should be taken in interpreting our data. First, student designers have limited experience, even in a university program with a strong design emphasis. Nevertheless, their errors may not be too far removed from those made by professional designers, since a considerable amount of industrial microprocessor design is done by recent graduates (but working under the supervision of experienced designers). Second, class projects are short in duration and the verification effort possible in such classes is modest. Consequently our data may contain a disproportionately small number of hard-to-detect errors, compared to data from industrial design projects. This limitation also applies to the data the derived from university research projects, but to a lesser extent.
6

