T cost that may be incurred when a logical bug is committed to silicon (For the Pentium, some have put it at US $500 million) But behind every such bug that makes the news, an uncounted number of logical errors go undetected throughout the design, implementation, and marketing phases of many products Either these errors will finally be detected and corrected in hardware, software, and subsequent releases--or they will simply remain as bugs, known to a few, experienced users.
greatly outstnps the set that can be simulated that even billions of tests, run continuously for a year or more, can never examine most combinations of behaviors Among those unexamined behaviors may lurk critical untested combinations or, worse yet, unknown system failure modes Thus, random testing, once considered an important tool for uncovering faults in unanticipated behaviors, is now viewed widely as inadequate O n the other hand, tests designed for specific scenarios leave unexplored possible combinations of behavior that fall outside the anticipated patterns.
Such difficulties have spurred research into methods that attempt to prove a system correct, in the same sense that a mathematical theorem is proved correct. The ideal algorithm would automatically analyze a system model and either conclude it was correct or reveal a bug within a reasonable number of steps. From a mathematical perspective, though, only very simple systems are suited to such decision procedures. Although such automated theorem provers had some successes, they were generally unable to pinpoint errors in incorrect designs and were difficult for a nonexpert to operate effectively.
O n e common observation of hardware and software designs holds that in the life cycle of a development project, the design mostly is tncorrect and approaches correctness only (hopefully!) at the end of the design and testing process, when the "last" errors have been detected and eliminated. Thus, a formal method that sometimes can prove a design correct, but not easily locate flaws when the design is incorrect, is more useful at the very end of the design and test process Furthermore, exigencies of the marketplace demand that design verification fit unobtrusively within the development process and, at any rate, not delay that process any more than the current practice of simulation testing.
For these reasons researchers sought design verification methodologies that were more limited in scope than the theoremprovers. They also wanted techniques that could be highly automated and could locate bugs in EDMUND M. CLARKE
Bell Laboratories
The verification process begins with a program (or a description of a circuit design) and a list of properties to be checked. The verification tool determines if the properties are true, or if not, provides a counterexample.
faulty designs as readily as they could prove error-free designs correct Attention at first focused on finite-state models, which can assume only a finite number of distinct configurations during any arbitrarily long or even endless execution Although limited in a mathematical sense, finite-state models necessanly encompass every digital circuit and every software system implemented on a digital computer. Their computational feasibility for systems with a finite but staggenng ly large number of states, however, is a key concern Today, the first computer-aided verification tools are becoming commercially available They are based on methods that in many cases can reduce the complexity of verification (without sacrificing guaranteed correctness) to such a degree that it becomes computationally teasible. Among the most powertiil of these methods are symbolic model-checking and homomorphic reduction, both of which represent a complex system in terms of a compact and computationally inore tractable structure.
Moreover, the two can be used together with a multiplicative reduction cffect, since they work independently of one another. Of special importance is the fact that they each can be implemented automatically, so the task of reduction is programmed into the computer rather than presenting a burden to the design engineer.
A5 early as 1976, Amir Pnueli, a researcher at the Meizmann Institute, Rehovot, Israel, and other computer scientists began proposing deductive systems, notably temporal logics, that could verity finite models. These logics support a syntax in which it is simple to express such notions as "If a process requests a bus, then eventually it gets to access it."
This approach suffered several practical drawbacks, however. Since temporal logics are most useful tor describing requirements placed on simple causal relationships designers really could not be expected to define an entire system model in temporal logic (Even logicians sometimes are unsure of the effective meaning of complex temporal formulas ) Moreover, although the decision procedures were guaranteed to give an answer, the computational complexity of the required checks grew exponentially with the size of the formulas For these reasons, interest in this approach was largely academic
The Dubbed model-checking by the Harvard pair, the approach had the great advantage of producing counterexamples through which programmers could locate errors. Computationally, model-checking appeared promising, especially since under reasonable restrictions its complexity grows linearly with model and formula size. This promise was at best elusive, though, since model size itself generally grows exponentially with the number ot variables that define the system, a phenomenon often termed state-space explosion.
Since the problem ot state-spacc explosion is intractable in the worst case, research on model-checking ha5 focused on heuristics or methods that circumvent the complexity barrier in special cases. Some heuristics have simply formalized techniques already used in simulation: abstracting inessential parts of the model and exploiting the modeli hierarchical structure and symmetries.
In 1987, one of us (Kurshan) unified and formalized these techniques into a single paradigm called homomorphic reduction (It was named for a mathematical structure-reducing homomorphism-"shape-" preserving function-on the Boolean algebra of formulas in program variables ) T h e paradigm defines assocrations among system events that need not be distinguished for a particular verification task. For example, to verify some property, it may be unnecessary to distinguish the non-zero values of some variable V. So the homomorphism may replace V with a new variable, the values of which are zero and non-zero.
T h e verification program, possibly with initial aid from the user, makes a guess about system values that need not be distinguished. T h e verification algorithm then checks that this guess (or one of its own) is correct. If so, the abstracted version is used for model-checking; if not, this "localization reduction" algorithm automatically adjusts the reduced model ness-preserving fashion was shown to be reduction-of thc str~icture ot the correand verification IS tried again formally equivalent to finding a Boolean sponding program Such homomorReplacing the given verification probalgebra homomorphism that preserves phisms includc mapping complex data lem with a simpler one in such a correctsome-but not all, or there would be no structures and control scqiiences into sim-
Debugging a C0"UnicatiOBQS
Chip were not confident of the behavior of specially designed cache registers used in the FIFO to speed up the operation of the chip. After checking many properties with SMV, Chen was unable to find any errors in the FIFO. He concluded that the error must be in some other part of the circuit. Eventually, Chen determined that all the major components of the circuit were correct, meaning that the error must be caused by the way in which they were connected. He built a concise model for the chip in the SMV language. The model did not describe the hardware in the chip that was clearly unrelated to the cause of the error; and the width of the data path was reduced as much as possible to cut the number of states.
Chen' s specifications were written in temporal logic that described the abnormal behavior. Running on a workstation, the SMV model-checker took less than half an hour to determine that the error arose from a condition in which the same address appeared twice in the FIFO. This condition-the result of incorrect resetting of address recycling circuits-caused the data to be sent twice, the second time overwriting the first, leading to duplication or disappearance of the data, depending upon the instant that the data was read.
The abnormal execution trace found by the model-checker was more than 50 clock cycles long. This meant that, starting in the initial state, it took at least 50 clock cycles for the error to occur. Since the width of the data path in the actual chip was larger and the input data tended to be more random, the error occurred only after a considerable period of time, which explained why it was so hard to find. Since obtaining an error-free version of the IC had high priority, the author, along with Ben Chen, a Fujitsu computer-aided design engineer familiar with formal verification, and their colleagues turned to the Symbolic Model Verifier (SMV) model-checker to debug the design.
Information about the chip design was available only in gate-level circuit descriptions. Since the circuit had more than 100 OOO gates, SMV could not be directly applied to the gatelevel circuit description. Chen decided to exploit the modular He started with a first-in, first-out buffer (FIFO) that was believed to be the most likely source of the error. Designers structure of the circuit to reduce the state explosion problem. values of all the system variables in terms of the system inputs This function is put intcr a unique form (usually the binary decision diagram form) to make it easier to manipulate during analysis When carrying out a breadth-first search, the set of marked states is also represented symbolically through a Boolean function The ability to perform the operations of Boolean algebra on these expressions allows the search to be carried out entirely using the symbolic forms As a result, checking formulas depends not on the number of states of the model, but on the compactness of the symbolic forms Symbolic techniques are not the ultimate solution to the state explosion problem, since there is no guarantee that the symbolic representation will be any smaller than the explicitly constructed model Nonetheless, a representation can be chosen to exploit the structure inherent in the state space of the program As a result, we can verify a model many orders of magnitude larger than any it is possible to construct explicitly Just how to do this has been the subject of much recent research
Hurdling the complexity barrier
Currently, the most powerful finitestate verification techniques integrate symbolic model-checking and homomorphic reduction But computational complexity remains a barrier for some cases O n e tactic would focus verification efforts upon any models that are susceptible to the known heuristics The challenge then becomes how to determine in advance which models have this property
There have been some successes in this direction Modular programming techniques, which limit the amount of information allowed past module boundaries, have played an important role in advancing symbolic model-checking and homomorphic reduction Even apart from verification, modular techniques have already been identified as useful in managing large programs So there is reason to hope that the best current programming practice and verifiability of programs may converge to a common ground Another tactic to promote verifiability, perhaps of more immediate use in an industrial setting, lets the model and available resources guide the vertfication effort This technique rests upon an increasingly common view that verification is more valuable in proving a model incorrect (and providing a counterexample to assist in debugging) than in proving it correct After all, systems can fail in many ways, some entirely beyond the reach of venfication All verification efforts proceed from assumptions about the environment of the model to be verified If these assumptions are incorrect, then a faulty system model may be "verified " Moreover, if the synthesis procedure is not itself venfied, then a correct design may yield a faulty implementation Intel explained that the floatingpoint division bug occurred because designers used a faulty scnpt to implement the Pentium's division table While the table's design was probably correct, the script produced a hardware version that omitted a few essential values If a verification procedure failed to examine this scnpt or its result, then even if the table and all associated parts of the algonthm for floatingpoint division were correct, the result still would have been faulty silicon Assuming that verification will never embrace all the ways in which a system can fail, perhaps verification should be so applied to a project as to extend the most benefit possible for the given resources of time and staff This is especially true in a CLARKE & KURSHAN -COMPUTER-AIDED VERIFICATION tion algorithm was a simple user-provided "seed": a specification of "e major components of the CDl C that probably ARTHUR B. GLAsER would be necessary for the analysis, and other components, he I S 0 standard MPEGZ main profile decoder chip is a mainly some of the memory in the internal FIFOs, that could microprocessortontrolled decoder for the program bit probably be excluded. The reduction algorithm iterates from T stream in a digital television receiver. A component this seed, attempting to perform its analysis in a logically conwithin the chip is the compressed data interface controller servative (correctness-preserving) fashion on a much smaller (CDIC), which is programmable by the microprocessor to accept model than the given one. (The algorithm automatically adjusts either bit-serial or byte parallel MPEGZ compliant input data. tt the model used for analysis as necessary, and ensures that any detects the start and end of frames, performs start code align-reported error is an error in the original unreduced model.) ment and synchronization between the incoming data rate After only about 90 seconds on a Sparc U ( workstation from and the internal processor rate, and buffers data, which it utti-Sun Microsystems, Mountain View, Calif., the verification tool mately stores in external dynamic RAM. The control logic for detected the error in the CDl C algorithm. After the error was the CDK contains about 2500 gates; and contributing a lot to corrected, the Bell Labs team repeated the analysis and the verthe computational complexity of verification is an on-chip reg-ification tool showed that the FIFO now never could overflow. ister file. It is described by about 2000 lines of VHDL code, and This proof, fully automatic but for the user-provided seed, contains 1500 latches, typical of the class of circuits that can be required the tool to search only 2.5 million states and analyze expected to be verified fully and automatically. about 9.8 million transition conditions in the reduced model (a At Bell Laboratories, engineers used the verification system significant reduction over the unreduced model, which has an Formalcheck to determine if the internal first-in, first-out estimated 1030 states). Once the tool found the bug, it probuffers (FIFOs) of the CDIC could overflow under the control duced an error track, which had to be interpreted. In this case, protocol. In fact, the analysis showed that there was a condi-the error track spanned 2000 clock cycles because the buffer tion under which the request to write data to the external had to be filled before the error could be detected.
DRAM was inhibited, permitting the internal FIFO to overflow.
Because of the complexity of the CDIC model, this analysis As confidence in and reliance upon finite-state verification grow designers will slowly learn to use the process in more focused and advantageous ways Sonietimes the fear is that merely learning how to integrate formal verification techniques into the design process may slow development unacceptably, or that the process itself may result in less efficient circuits. In fact, verification may detect errors earlier in the design cycle, thereby actually speeding up the overall project. In many modes of use, formal veritication has no effect upon the form and efficiency of the ultimate design. But even when formal verification leads to design compromises, somc performance degradation may be worth the price, especially in view ot ever taster circuitry, since a more reliable design is being brought faster to market.
Still, some data-intensive algorithms such as arithmetic and logic units may remain beyond the scope of purely automatic tinite-state methods Although automated theorem-provers may not face the same limitations, their enormous user-essages are unprens" must be considoverhead probably would make them impractical o n a Pentium-sized project.
But a ray o f hope i s emerging on the research front recent hybrid methods of veritication integrate finite-state modelchecking w i t h automated theorem-proving. Using the two approaches in concert, engineers currently are working o n techniques that they hope will one day be used t o develop an entire complex microprocessor in considerably less time and more reliably than currently possible. 
