The state of the art in hardware design is the use of hardware description languages such as VHDL (Very High Speed Integrated Circuit Hardware Description Language). The designs given by VHDL programs are re ned up to the point where the physical realization of the new circuit or board can be synthesized automatically. The designs are tested by simulating them and comparing their output to that prescribed by the speci cation. A signi cant part of the design e ort is spent on detecting unacceptable deviations from this speci cation and subsequently localizing the sources of such faults. In this paper, we describe an approach to employ model-based diagnosis for fault detection and localization in very large VHDL programs (several 100.000 lines of code). In order to achieve su cient performance for practical applicability, we have developed a representation that allows a highly abstracted view of programs and faults, but is su ciently detailed to yield substantial reductions in the fault localization costs when compared to the current manpower-intensive approach. The implementation in conjunction with the knowledge representation is designed with openness in mind in order to allow use of the highly optimized simulation tools available. Discrimination between diagnoses is achieved by use of multiple test cases and measurement selection. Test results show su cient performance for practical use.
Introduction
Recently, a number of authors have proposed the use of model-based diagnosis for localizing faults in the designs of artifacts, e.g., bugs in software systems or communications protocols 4, 9] . In this paper, we describe the use of diagnosis techniques according to the consistency-based approach 8, 5] for identifying faults in hardware design descriptions. Since a substantial part of the overall e ort in hardware design is spent trying to discover misbehavior and to identify the design errors resulting in this misbehavior, the use of diagnosis utilities will produce a signi cant return of investment. In this paper, we examine the hardware design cycle using languages such as VHDL. VHDL is one of the most widely used hardware description languages and an IEEE standard. A speci cation (written in VHDL) delineates the functional requirements for the hardware to be designed. This is followed by a detailed design on the register-transfer (RT) level, which, once it has passed all tests, is automatically converted (\synthesized") into a gate-level representation. The design is tested by running simulations of the VHDL code for multiple test cases. The simulations produce traces of the values occurring over time on the signal lines in the simulated hardware. These so-called waveform traces, usually comprising several 10,000 signal changes, are checked by the designer against the waveform traces generated by the speci cation on the same input values. A discrepancy indicates the existence of a fault in the design. The average simulation runtime increases strongly throughout the design cycle as the design becomes more complex and detailed. Large hardware designs (comprising multiple ASIC's and microprocessors) can reach dimensions of several 100.000 lines of VHDL code and thousands of components and signals at the top level. For such designs, typically written by a large team of designers, fault detection and localization becomes a very time-consuming activity. The usual model-based system representation in diagnosis (consisting of a set of components, a system description, and observations) can be adapted to VHDL programs without much trouble. The system description is the VHDL program together with part of the semantics of VHDL, and the observations are either based on the traces produced by simulating the speci cation for a given test case, or on indications of incorrect signal values entered by the user. Components refer to instances of speci c VHDL constructs occurring in VHDL programs. However, given that even an individual simulation run takes from hours to days of real time on a high-end workstation, diagnosing a complete logical representation of the full VHDL program and its semantics is not feasible. A strongly abstracted view of the design must be used for diagnosis. We abstract over values and time points 7], but still have the capability to distinguish between the initialization phase and operating mode of a circuit, a requirement for handling feedback loops. Sucient discrimination between diagnoses is achieved by applying multiple test cases and measurement selection. Further discrimination can be achieved by requesting the user to evaluate the correctness of particular signals. The diagnosis system as described in this article was implemented to t into the existing VHDL design environment and is intended for use in all design phases where VHDL is used. The paper is structured as follows: Section 2 introduces the example we will use throughout the paper and de nes the basic terminology required to apply model-based diagnosis to VHDL. In Section 3, we discuss the abstracted knowledge representation used in our tool and its complexity. This is followed by an overview of the implementation and test results of the diagnosis system (Section 4) and nally a comparison with related work. This formal speci cation can be easily transformed to a speci cation written in VHDL, which allows us to simulate the speci cation and produce a waveform trace. Figure 1 shows a possible design of the D75 consisting of four components. A designer who uses VHDL must now implement a program for every component to get the same behavior as the speci cation. The behavior can be tested by checking the waveform traces for every test case speci ed by the designer with the speci cation traces. If this check leads to discrepancies we want to nd the component that causes the misbehavior. The dotted lines in Figure 1 show the functional dependencies between inputs and outputs of the components.
The following VHDL code fragment describes the internal topology of the D75 as shown in gure 1.
entity d75_e is port( signal a,b,c,d,e: in integer; signal f,g,h: out integer); end d75_e; architecture a2 of d75_e is ... The after keyword indicates that the new value is assigned with a delay of 5 ns. signal s1,s2,s3: integer := 0; ... begin i1: c1 port map (a,b,c,s1,s2); i2: c2 port map (s1,s3,f) 
Signals(a2; d75 e) = fa; : : : ; h; s1; s2; s3g.
For simplicity we will write down only the label of the concurrent statement (e.g. i1 . . . i4) instead of the whole statement in our paper. i1 thus stands for \i1 : c1 port map (a; b; c; s1; s2)" and so on.
Until now we have not de ned the behavior of the concurrent statements in our D75 example. To do this we write down signal assignments that are supposed to describe the behavior for every concurrent statement i1 : : : i4. We write ini for the i-th input and outi for the i-th output of the concurrent statement. We order the signals of the port maps so that input signals come rst and outputs last.
out1 <= in1+in2+in3 after 5 ns; i4 port map(in1,in2,out1,out2) out1 <= in1; out2 <= in2;
The VHDL program can now be tested. We show two test cases which stimulate the D75 with the following initial input vectors.
Signal
The VHDL simulator produces waveform traces which can be checked for correctness with regard to our speci cation or an older (correct) version of the program. In our D75 example we have a formal speci cation that can be transformed to VHDL. We can thus show the correctness of a VHDL program for a given input pattern and a certain time interval by comparing (automatically) the trace from the speci cation execution with the trace of the VHDL program. The discrepancies are then used as a starting point for computing diagnoses. Formally, a trace is de ned as follows:
De nition 2.2 (Waveform Trace) Let 
Computing Diagnoses
In this section we formalize the diagnosis knowledge used in our VHDL-diagnosis system. The complexity of the VHDL programs to be diagnosed precludes analysis down to the last detail. The goal, rather, must be to compute results that will enable the designer to pinpoint the actual fault and nd it faster. Because of the complexity of the problem, we have to be content with a signi cantly abstracted representation, governed by these criteria:
No diagnoses may be excluded due to abstractions. Integration with available commercial simulation packages. Computational costs must be minimized by requirering very few additional simulation runs.
The principal idea is to abstract as much as possible over time and values on the one hand, while preserving the capability to discriminate between substantial parts of the VHDL-code on the other hand.
Components
VHDL code is partitioned into hierarchies of concurrent statements which usually bottom out in a sequence of sequential statements. Such a sequence contained in one concurrent statement typically comprises only a small fraction of the complete VHDL program, and as a rst approximation it is not worth the e ort needed to pinpoint individual sequential statements as the source of a fault. Therefore, identifying faulty concurrent statements is our main goal. Example (cont.): in(i1) = fa; b; cg, out(i1) = fs 1 ; s 2 g For here on we represent each inout signal as a combination of one input and one output signal.
System description
The functional dependencies (encoded in rules) are then used to nd the components which are responsible for the discrepancies.
De nition 3.3 (Functional Dependency) Let C be a component of entity E and architecture A. An output out i 2 out(C) depends functionally on a set I in(C) if a change of an input signal in j 2 I at time t changes the output signal out i at some time t 0 t. Example (cont.): s 1 depends functionally on signals a and b.
The main idea of our abstraction is to ignore time and values of the signals. In our system description we only talk about correct or incorrect signals which means that we don't care when the discrepancy occurred and what the expected value would be. In particular, the output of a component is correct if the component is correct and the input signals (on which the output depends) are correct. On one hand this abstraction has the advantage that we do not need to reason about thousands of time points for hundreds of signals. On the other hand it has the obvious drawback that we can exclude fewer diagnoses than with a detailed model. However, programmers use this abstraction during debugging very e ectively to identify the faulty concurrent statement. So far we have completely abstracted over time and signal values in order to gain an e cient diagnosis system. However, in order to handle feedback cycles and the initialization phase of the circuit, at least a simple representation of time must be included in the model. We rst discuss the formulas in the system description that describe the behavior of diagnosis components (i.e., concurrent statements) and later those for connections. The correct behavior of a single VHDL concurrent statement is described by multiple rules. Our running example results in a separate rule for each signal assignment:
The same signal can be declared as output in di erent concurrent statements. In this case a simulator uses a so-called resolution function to deduce the nal value of a signal at a given time point. This value depends on all the components which use the signal as output.
De nition 3.5 (SD part 2: connections) Let A be an architecture, E an entity, and Driver(s) = fcjc 2 COMPS(A; E)^s 2 out(c)g where s is a signal.
The system description modeling the connections of the VHDL program given by E and A is the set of sentences which includes for each signal s 2 Signals(A; E)^s 2 out(c)^c 2 COMPS the sentence V c j 2Driver(s) ok(c j ; s) T ! ok(s) T Note that this de nition implies that signals without drivers (i.e., system inputs) are always assumed to be correct.
Example: Given component i3 from our example the rule ok(i3; s3) T ! ok(s3) T must be element of the system description.
Observations
After the programming phase, tests, i.e., simulation runs, are conducted, possibly leading to the discovery of a misbehavior of the system. Misbehavior is detected by comparing selected outputs of the program with the outputs of the speci cation. We will call these selected outputs together with the stimulated inputs traced signals. Test Case 1: OBS = fok(a); : : : ; ok(f); :ok(g); ok(h)g Test Case 2: OBS = fok(a); : : : ; ok(e); :ok(f); ok(g); ok(h)g Next, we need to assign observations to particular points in time since we have associated time points to the ok=1 predicate declaring a signal to be correct. Since we ignore the exact time point when a signal becomes incorrect we have chosen an assignment of the observation to a given time point such that no diagnoses are excluded. Therefore every con ict in our abstraction has to be a superset of (or equal to) a con ict in a \hypothetical most accurate model". In other words, we have to make sure that con icts are not too small. Lemma 3.1 Let A be an architecture and E an entity. Because of the structure of our system description we are able to conclude that for n = jCOMPSj and each signal s 2 Signals(A; E) and t n holds: ok(s) n $ ok(s) t .
Therefore, it is su cient to consider n time points where n is the number of components and to assert :ok(s) $ :ok(s) n for each s 2 TSIG^Discrepancy(s). 
Diagnosis
For the de nition of diagnoses we will apply the consistency-based view as de ned by Reiter 8] .
De nition 3.8 (Diagnosis) A diagnosis for a VHDL program hA; Ei is a subset of COMPS such that SD OBS fok(c)jc 6 2 g f:ok(c)jc 2 g 6 j = ?
Since in our domain in almost all cases prede ned tests are applied to a VHDL program, we incorporate tests in the de nition of diagnosis. Note that because of our abstraction we do not need to transfer knowledge across tests.
De nition 3.9 (Diagnosis wrt Tests) Let Therefore, there is one minimal diagnosis i1]. Inspecting the de nition of i1, we note that the if statement concerning the value of signal b is missing. As a result, we achieved a fault localization with two test cases. As initially noted, we designed our knowledge representation according to the safety principle that no diagnosis may be excluded. On one hand, this may result in too many diagnoses. However, this is not a serious problem if multiple test cases are applied, and in addition the user can be asked to judge the correctness of a signal. On the other hand, the calculation of diagnoses is simple and fast enough for computing diagnoses even for large programs. This is exactly the case where assistance for fault localization has a very high return on investment.
Complexity issues
Unfortunately the need to introducing discrete time points implies that the size of the system description does not increase linearly with the size of the system. ).
For programs with hundreds of components, the nonlinear size of the system description implies very long runtimes for the consistency checks required during the search for diagnoses. Since usually multiple consistency checks are required, a straightforward implementation of this system description results in completely unsatisfactory performance. Therefore, for implementation purposes this representation is transformed into a system description that employs a simpli ed approach for subsystems which are not involved in the kind of cyclic dependencies that led us to introducing distinct time values in the rst case. The modi ed approach produces system descriptions that grow linearly in terms of system size. For space reasons we do not present the transformation in this paper. For a detailed description, see 11].
Implementation and Result
VHDLDIAG parses VHDL source les to generate a set of diagnosis components and their interconnections. Diagnoses are computed with standard techniques described in 8, 6] . Additional measurements for discrimination are selected using the minimum entropy approach introduced in 5]. In our context, making additional measurements means specifying additional signals to be traced in the next simulation cycle.
Our diagnosis approach has been tested on a number of examples, including actual full-size ASIC designs. The VHDLDIAG system has beem in use in various design projects. Although developers are only just learning how to use the system e ectively, we can report the following impacts on the design cycle: The total e ort spent on the simulation-design-debug cycle has been reduced by 25%. Higher savings are expected as users become more familiar with the tool. Computation costs for simulation runs were reduced by 15%. The following data points describe one of the ASICS used for testing of the VHDLDIAG system (this ASIC has been manufactured and used in real world applications). Because of the knowledge of all signal values over time, VHDLDIAG computes exactly one minimal diagnosis per level. This diagnosis was always the expected one. A complete regression test lasts about one hour per test case, whereas manual examination of test results lasts several hours for only the top level of an ASIC. VHDLDIAG can diagnose faults at all levels of the ASIC design due to the ability to incorporate all traced signals in the solution (which is far beyond the ability of a human studying a trace). In the second case, we used only one ASIC implementation and user input about signal correctness to nd the faults. Every question VHDLDIAG presents to the user reduces the number of diagnoses but increases the overall diagnosis time, due to the need to recompute diagnoses. The e ect of VHDLDIAG in this domain is to focus the designer's attention on the relevant signals by using automatic measurement selection. Debugging time can again be saved. All other ASIC examples required at most 3 minutes to compute all minimal diagnoses at the top level. The number of components and signals decreases for lower levels in the ASIC design. At the lowest level we have measured diagnosis times below one second.
Diagnosis results strongly vary depending on the number of signals which could be classi ed as ok or :ok. If this number increases, the expected diagnosis can be computed with more accuracy.
VHDLDIAG has shown that a simple model used for model-based diagnosis can be good enough for utilization in the VHDL design process. Especially in the domain of regression tests, fault-nding time can be reduced and quality can be increased. Using VHDLDIAG while implementing a new program helps the designer by focusing on relevant signals and components. The diagnosis runtimes achieved by the VHDLDIAG prototype are short enough to allow interactive diagnosis even for very large VHDL programs.
Related Work and Conclusion
An abstraction over quantitative signal values (distinguishing between correct and faulty values) in the domain of electronic devices is also proposed in 2], but there is no analysis of functional dependencies. In contrast, we can generate these dependencies automatically and use them to construct the model without user interaction. The use of functional dependencies reduces the set of inputs in uencing a given output, resulting in stronger restrictions on the set of diagnoses. A di erent approach to diagnosis and repair of hardware designs is introduced by 3], but is restricted to single faults and gate level designs (nowadays normally automatically synthesized from high-level designs) and thus of limited applicability. Another approach, with a long research history in the hardware design community is formal design veri cation 1]. Implementations of these algorithms have been used with success on small circuits with limited functionality, but usually show unacceptable performance when applied to large designs. A number of authors have presented methods and algorithms for error diagnosis in logic programs 10]. Their approach is based on exploiting proof properties of Prolog programs and is therefore not directly applicable to imperative programming systems. Relative advantages of the model-base approach are pointed out in 4]. In this paper, we have presented a computationally feasible approach to diagnosing very large hardware design descriptions. We used the VHDL language as the basis for our implementation, but the results can also be used with other hardware description languages. The model is based on an abstract representation of signal dependencies which are computed automatically by analyzing the VHDL source code. The representation is simple but e ective to reason with and allows the user to focus the search for failure locations on a small fraction of the total VHDL code, resulting in signi cant savings in terms of overall design e ort.
