Introduction
A telephone switch is built to some unusually strict reliability requirements. The average downtime, for instance, is not allowed to exceed 3 minutes per year. As we all know from experience, few software products today can satisfy such strict requirements. Rigorously testing a nontrivial piece of software is perhaps even harder than designing and writing that software in the first place. This is not a new phenomenon. The first time a "software crisis" was declared, in the late 1960s, the typical size of a program was significantly smaller than it is today. In 1973 the source code written for one of the early versions of the UNIX* operating system, for instance, consisted of only 6,600 lines of code in C programming language. Today, the source code size for the most widely used PC operating systems exceeds this by a good three orders of magnitude.
But it is not just the average code size that has increased. Purely sequential applications that have no significant interaction with concurrently executing applications elsewhere have become rare. Typical applications today can interact with other applications to exchange data, and they generally rely on each other's well-being to function correctly. Computer scientist Leslie Lamport once defined a distributed system succinctly as "one that stops you from getting any work done when a machine you've never even heard of crashes." 1 If a medium-size sequential application can challenge our reasoning skills, a large concurrent application can easily defy them. Executions in these systems are no longer deterministic: The outcome of an execution can depend on an irreproducible interleaving of asynchronously executing processes. As a result, conventional software testing loses much of its value. A successful test gives no guarantee that the same test could not fail when repeated, and any defect uncovered by a test once may be impossible to reproduce later. In this paper we therefore focus on one of the hardest problems in software design: verifying the logical correctness of systems of interacting concurrent processes. The most prominent examples of these applications are telephone switching systems, distributed operating systems, and data communications protocols.
During the last two decades, researchers at Bell Labs have made fundamental contributions to the development of verification tools for this class of applications. As an example, we will discuss one main line of work at Bell Labs that has led to the development of one of the most widely used verification systems for distributed systems software today-the Spin model checker. 2 In this paper we describe how a software model checker works and what makes it efficient. We also review the main theoretical advances that have been made, as well as improvements achieved in integrating these techniques into a mainstream software development process.
Verification has already been applied to some fairly significant applications. The Spin tool, for instance, was used at NASA to verify the control software for two of its spacecraft. 3, 4 In The Netherlands, the tool was used to verify the correctness of control software for a large new movable flood barrier system near Rotterdam; 5, 6 in Italy it was used to verify new railway interlocking software; 7 and at Bell Labs it was used to verify the call processing software for a new switch. 8, 9 In all these applications, the verification system could identify significant software defects that were not detectable in conventional system testing. Later in this paper, in the section "Verification and Testing," we discuss the cases where, within this realm of applications, software verification techniques can indeed systematically outperform conventional testing techniques, both in thoroughness and overall efficiency.
The First Tools
Before we can build a tool that can mechanically intercept software defects, we should be able to define precisely what a software defect is. Some types of defects will be clear: It should be impossible for a program to crash, hang, or corrupt data when it is executed. But it is not enough just to list some of the things that could prevent a program from performing its function. We need to state precisely what a program is meant to accomplish before we can thoroughly verify its operation. This requires a level of rigor in the formal treatment of properties and specifications that was not available until fairly recently.
Among the first software applications that could be specified in a form amenable to automated verification were data communications protocols. By the late 1970s, it had become standard to specify protocols formally as finite state automata. One of the early examples of this is the definition of the simple alternating bit protocol, which was proposed to improve the reliability of computer-to-computer communications over noisy channels. 10 Table I shows a sample state machine specification for one party of the alternating bit protocol. The specification for the other half of the protocol is obtained by replacing A0 with B0, A1 with B1, and vice versa. Figure 1 contains a graphical representation of both automata. Labels in shaded boxes indicate message transmissions; the remaining labels indicate message receptions. The double arrows indicate that the current input has been accepted and a new message must be fetched for output. The initial state is S0 in both automata.
The column entries in Table I specify the next state reached when the event specified in the column heading occurs. In state S0 the only option is to transmit a message of type A with sequence number 0 and then move to state S1. In S1 the protocol machine waits for a response of type B. If the message carries the sequence number 0, the protocol machine advances to state S3. If the sequence number is 1 or if the message is corrupted, the machine moves to state S2 instead and repeats the last transmission. In the absence of errors, the protocol machine will cycle through the states S0, S1, S3, S4, S0 forever. To achieve useful data transmission, a payload of data can be attached to each A and B message. New data is obtained only after the immediately preceding message is correctly acknowledged. In Table I , this occurs at states S0 and S3. Note that every execution of this protocol defines a path in the two graphs shown in Figure 1 . The system execution as a whole can be seen as an interleaving in time of local steps in the two automata. In this case, the number of interleavings that can occur is relatively small. For larger systems with many asynchronously executing processes, this number can be extremely large, making it very hard to reason about the behavior of these systems.
The specification of a protocol in the form of an automaton, with simple and well-known semantics, made it possible to devise automated means of analysis. In a 1969 commentary on the paper that first introduced the alternating bit protocol, Lynch wrote, with considerable foresight: "I feel it should be possible to develop proof procedures for the reliability of automata … ." 11 Initially, however, the analytical techniques tried were mainly restricted to manual types of analysis. It took another ten years before work on automated tool support started in earnest.
Upon joining Bell Labs in 1980, the author of this paper was challenged by Doug McIlroy, then head of the Computing Sciences Research Center, to find a way to prove the correctness of the software for a small phone system called TPC, or The Phone Company, written mainly by Lee McMahon, a fellow Bell Labs researcher. To solve that problem, we wrote a reachability analyzer for a small formal model of the code and were able to mechanically intercept some subtle problems that had escaped detection in conventional testing. This analyzer-called PAN, or protocol analyzer-used a depth-first search algorithm to recursively explore the reachable states of the modeled system 12 (illustrated in Figure 5 , later in this paper). The algorithm easily detects simple types of defects-such as system deadlocks, cases of incompleteness, buffer overflow, and the presence of unexecutable code fragments (dead code). The decision to perform the analysis during the search process, rather than after it, later proved to be a critical advantage in the analysis of large-scale systems. Today, on-the-fly verification is the de facto standard for software model checking systems.
Automata and Logic
Around 1983, Pierre Wolper-another Bell Labs researcher at that time-joined with Moshe Vardi in a study of a theory that could support the automatic verification of a broader range of system properties. 13, 14 The properties that Vardi and Wolper considered were specified in a logic that had been proposed by Amir Pnueli in the late 1970s. At that time Pnueli's logic for reasoning about concurrent systems 15 was quickly gaining popularity. This temporal logic allowed one to reason about system executions by formally relating the occurrence of events in time. A simple example of a temporal logic formula that states a system requirement is
The validity of a temporal formula is defined over possibly infinite system executions. The formula above uses two Boolean operands, request and response; two temporal operators, u and ; and one logical operator, →. The box operator, u, is pronounced always. The formula u p, for instance, is true for a system execution σ if and only if the operand p remains invariantly true for all steps of σ. The diamond operator, , is pronounced eventually. The formula p is true for execution σ if operand p becomes true at least once in σ. The operator → stands for logical implication. For instance, using the logical or operator ∨, we can rewrite (p → q) as (¬ p ∨ q) to express that if p is not false, then q must be true.
The formula used in the example expresses the property that, in an execution, it should always be the case that there is either no request at all, or if a request occurs, there will eventually also be a response. If we wanted to exclude the possibility that no requests are issued at all, we can formulate an additional property u (request), which states that there will eventually be a request. As an alternative, we can combine the two properties into a single one, using the logical and operator, ∧, as
Examples of these types of requirements are readily found in practical applications. In the alternating bit protocol, for instance, we could formulate the requirement that the transmission of an A0 message (the request) is always eventually followed by the reception of a B1 message (the response). Similarly, in a telephone system, if a subscriber has call forwarding, the arrival of an incoming call (the request) should always eventually be followed by a forwarding event (the response).
Automata Theoretic Verification
Vardi and Wolper 13, 14 showed that any property expressed in the formalism of temporal logic can be converted into an automaton that can then be used in an algorithmic verification process, provided that the system for which we want to check the property is also specified as an automaton. Assume we are given a system automaton S and a temporal logic formula f. We can check if S satisfies f with the following procedure.
We first negate property formula f, using standard Boolean negation, and then derive the corresponding automaton A (¬f ) using Vardi and Wolper's procedure. Note that while f makes a positive statement about the correct executions of our system, ¬ f captures the executions that are not correct-that is, ¬ f expresses the potential violations of f.
Next, we search for any execution allowed by both S and A (¬ f ) , as illustrated in Figure 2 . Formally, this search amounts to the construction of a "synchronous product" G of S and A (¬f), written . If this synchronous product is non-empty, executions of S exist that violate f. The existence of one such execution suffices to disprove the validity of f ; conversely, proving that there can be no such executions suffices to prove that f cannot be violated.
Before this method can be used in earnest, however, we must consider a number of practical issues:
• When we verify a property of a system of asynchronously executing concurrent processes, the system automaton S is itself defined as an interleaving product of automata. In the alternating bit protocol, for instance, S would express the combined behavior of the two communicating
parties. An efficient way to compute S from smaller components is therefore essential to enable this method to work. • For any serious application, such as the call processing software of a telephone switch, 9 the system automaton S can be very large. The cost of verification can increase linearly with the product of the sizes of S and A ( ¬ f ) . Therefore, we need very good algorithms for reducing these sizes before verification is attempted.
• Finally, if we want to apply this method to software verification, we need good ways to relate program source code to automata specifications, covering not just control flow but also the use of data. Since the automata theoretic method was first proposed, significant progress has been made on each of these issues. Together, these improvements make it possible today to apply the automata theoretic verification method to software verification problems of impressive size. We will briefly consider the algorithms that have been developed and mention some of the recent applications of this technology.
Computing System Behavior
The verification procedure would be useless if it could not be applied to large problems. A typical application today, such as the verification of significant portions of the call processing software for a telephone switch, 9 can involve the concurrent execution of large numbers of potentially interacting processes, each with access to considerable amounts of user data. If the individual behavior of each process can be captured in the form of an automaton, the joint behavior of all these automata defines system automaton S. The automaton S can be large and expensive to compute. If S is known to contain defects, however, it is wasteful to spend significant resources to compute it exhaustively if instead a small initial portion can suffice to expose them. It is also redundant to compute all of S if only the portion that intersects with A (¬ f ) is needed (as shown earlier, in Figure 2 ). The verification system Spin uses an on-the-fly procedure to compute the intersection product directly, that is, without first computing S separately. The computation can stop as soon as the first counterexample to a logic property has been generated, which often happens quickly after the search begins. Even if no defects are to be found in S, it is only necessary to compute the fragment of S relevant to the verification of property f. The method used to compute in Spin is an optimized nested depth-first search algorithm. 16 
Reduction Methods
In some cases it can be too expensive to compute even the fraction of needed to decisively prove or disprove f. If S contains defects, they are not necessarily present in the fraction of G computed when available resources, such as time or memory, have run out. Figure 3 illustrates some of the improvements in performance achieved in this area. As an example we used the 1980 model of the TPC phone switch software (A in Figure 3 ) that triggered the first work in this area at Bell Labs. We will review the main reduction methods below.
The first method is based on using dedicated data structures that can frugally store the reachable states from the product automaton G. For a small runtime penalty, one can use standard memory compression techniques. Clearly, the more aggressive the compression method is, the larger the fraction of G that can be computed, and the more thorough the search for defects can be. The supertrace algorithm (B in Figure 3 ), introduced in 1987, pushes this to a limit by using lossy compression to record each reachable state Figure 2 .
Computing the intersection G of the languages defined by automata S and A (¬f ) to find executions in S that violate property f.
from G in just two bits of memory. 17 The supertrace algorithm takes advantage of the fact that bit addresses also carry information, making it possible, effectively, for n bits to record 32 3 n bits of information (assuming an address length of 32 bits). Because only minimal information is recorded, the supertrace algorithm also speeds up the verification process. Even in cases where the full size of G far exceeds available resources, the supertrace algorithm can still approximate the results of an exhaustive verification in a very short time. The algorithm is used today for solving large problems in almost all software verification tools that have been developed. Another method of reducing the cost of verification is based on the observation that the full product G contains considerable redundancy. A system of concurrently executing processes allows an overwhelmingly large number of possible executions. Many of these executions, however, differ in unimportant ways-for example, in the way that logically independent actions are interleaved in time. In the late 1980s and early 1990s several people, including Bell Labs researchers Patrice Godefroid and Doron Peled, developed theories that could be used to capture the minimal fraction of system executions that should be considered to verify a concurrent system. In typical examples, the application of these partial-order-reduction strategies can result in a reduction in the fragment of G that must be computed by an order of magnitude or more, without loss of generality. In 1995 we adopted in the verifier Spin an efficient variant of partial-order reduction, named static reduction (C in Figure 3) . 18 A small extension of this method, used to optimize automata models, was added in 1999 (D in Figure 3) .
A third method to reduce the memory requirements of a verification run is the use of alternative, highly compact, representations of the reachable state space. One such method consists of the computation of a minimized graph structure for recording all reachable states with as little redundancy as possible. 19 Such a storage method can trade increases in verification time for sometimes dramatic reductions in the memory requirements of a verification. The benefits, however, are often problem dependent.
Reducing the Size of A (¬f)
The size of G depends not just on S, but also on the size of the automaton generated from a temporal logic formula f. The initial construction procedure, which proved that for any formula f one can construct an equivalent automaton A(f), generated prohibitively large automata. In 1995 Gerth et al. developed a new algorithm that could produce much smaller automata. 20 This algorithm was added to Spin that same year. More recently, Kousha Etessami, also from Bell Labs, showed that the automata could be reduced even further in size, often by a substantial margin. 21 Together, these improvements translate into significant reductions in the complexity of verification, making what were once inconceivably large problems eligible targets for formal verification.
Deriving Automata from Programs
We have discussed how one can efficiently compute the intersection product of a system, specified as a collection of automata, and a property, expressed as a logic formula. We mentioned the algorithms that can be used for extracting automata from formulae and for reducing the amount of work needed for product computation. We now discuss how one can derive an automaton specification from a substantial piece of software.
The full implementation of a software product can contain a significant amount of detail that falls outside the scope of the verification, either because it was checked before or because the detail is unrelated to the property being checked now. This means that an automaton description can be much more compact than a full system implementation. Today the method most commonly used to create a high-level description of a software product is to produce one manually. Such a hand-coded model focuses on those aspects of the software that are central to the properties to be checked. There are two basic problems with this method. First, just as the original software can contain defects, so can a manually constructed model. Second, the manual construction of a model for a substantial piece of software can be intellectually demanding and very time consuming. For safetycritical systems, this investment can often be justified. Not surprisingly, therefore, the use of formal verification techniques has been so far restricted mainly to these types of applications. For routine software development, the manual construction of an automaton model from systems code is often too cumbersome. Several solutions to this problem have been tried over the years. In one of the first large-scale verification efforts at Bell Labs, we defined a subset of the programming language used for switch development in such a way that programs written in the resulting language corresponded in a natural way to the high-level automata description required for verification. 22 Although successful from a verifier's point of view, in the long run the restrictions to the programming language were not tolerable. We also developed methods to extract verifiable automata models from high-level system requirements, to secure correctness for the starting point of a system's development. 23 Nonetheless, the correctness of high-level system requirements still cannot guarantee the correctness of a system's implementation.
Mechanical Extraction of Automata
We recently developed a method that makes it possible to mechanically extract automata models from implementation-level code, as illustrated in Figure 4 . The automata extraction covers both control flow and data use for each asynchronous process in the system. It is guided by a user-defined lookup table that restricts the model to those parts of the application that are of primary interest in the verification. 8 With this method, model extraction takes a fraction of a second and is guaranteed to match the product implementation. Verification results for an update of the program sources can be obtained in minutes, rather than in weeks or months, as was the case with previous methods. A description of the first application of this method to the verification of the call processing software for one of Lucent's new switching systems will appear in the next issue of the Bell Labs Technical Journal. 
Verification and Testing
The capabilities of a verification tool like Spin are narrowly focused on issues of logical correctness in a distributed system. The verifier, for instance, is not designed to check purely computational aspects of a sequential program or to compute performance metrics. Its power is in checking that the interactions of components in a distributed system satisfy functional properties. Within that domain of interest, a logic verifier like Spin can significantly outperform standard software testing methods, both in thoroughness and in speed. We have already mentioned the problem of reproducibility in a distributed system. We can use a verification system to formulate queries about the feasibility of system behavior that cannot be answered at all in on-line tests. Surprisingly, a verification system can also explore the possible executions faster than a system based on a real-time execution of the program. The reasons for this efficiency are in the nature of the search process and in the use of abstractions through the automata models.
To see how the search process helps to make verification more efficient than a straight set of system executions, consider the tree structure shown in Figure 5 . It could represent a simple piece of software with fixed initial state, shown at the root of the tree. Each execution passes three successive conditional branches, for a total of eight possible system executions. Since each of these eight executions consists of three steps, an exhaustive test of this system-that is, executing each of the eight possible scenarios-would require a total of 24 steps.
The execution steps in Figure 5 are numbered in the order in which a depth-first search algorithm would explore them. Note that there are only 14 such steps. The exploration starts at the initial state of the automaton, shown here as the root of the tree, and recursively explores all successor states. At each newly visited state we select the left-most unexplored successor, move there, and repeat. When there are no more unexplored successor states, the search moves back one step to a previous state. If there are no more previous states (that is, the search has returned to the root state, with all successor states explored), the search terminates and the full extent of the automaton is known. The gain in efficiency comes from the fact that if we want to check the path 1-2-4, after having checked the path 1-2-3, we do not need to repeat steps 1 and 2. In general, if we are n steps deep in the tree, and a new execution path to be tested differs only in the last m steps from the previously tested paths, we can avoid having to repeat the first n -m steps and check only the steps that differ.
For the tree in Figure 5 , a depth-first search explores 14 steps instead of 24. More generally, to check a binary expanding tree structure of n levels requires n 3 2 n steps with a conventional test and 2 (n + 1) -2 steps with a depth-first search. The gain in efficiency for n > 1 increases linearly with n. For an average execution sequence of 20 steps, for instance, the gain would be a factor of 10.
The remaining issues that distinguish a conventional system test from a run with a logic verifier have to do with controllability and observability. In a concur- rent system the executions of individual processes can be arbitrarily interleaved in time, unless otherwise constrained by process synchronization mechanisms. The eight full paths in Figure 5 , for instance, could represent the possible interleavings of two choices (and subsequent execution steps) in one process, with one choice (and execution step) in a second. Precisely how these steps would be interleaved in time during a real system execution is typically beyond the control of a tester. No matter how many times the test would be run, there is no guarantee that all interleavings would actually occur. By virtue of the extracted automata model, a verifier can indeed check each relevant interleaving. The partial-order-reduction strategy reduces the executions that are inspected to only those that differ in relevant ways, for instance, because of access to shared data. The same argument goes for observability. Not all aspects of the concurrent execution of asynchronous processes are observable in a conventional system test. The details of process interleavings are difficult, if not impossible, to record; similarly, large classes of data may not be visible at all from the testing interface. In a logic verification system, the way in which automata are extracted from program sources can guarantee the observability of all events and conditions.
Conclusion
Software verification techniques have long remained relatively obscure, seemingly incomprehensible, and largely irrelevant to industrial practice. The cumulative effect of both algorithmic improvements and improved tool design is changing this. Today the use of software verification tools is already considered non-negotiable on safety-critical systems, and there are a growing number of successful applications to industrial-size software. [4] [5] [6] [7] 24, 25 At Bell Labs we recently completed the construction of a software verification system for ANSI Standard C code that uses the tool Spin as a hidden verification engine. This system, called FeaVer, has already been used in the verification of the call processing software for Lucent's new PathStar † Access Server. 8, 9, 26 In this application we tracked the design of the call processing software over a period of about 18 months, performing routine verifications of virtually every version of the code. The mechanized model extraction capability made it possible to provide immediate feedback about newly appearing defects, often inadvertently introduced as a side effect of routine software updates.
The initial reaction from programmers to the new level of precision in program verification is often one of surprise. The types of concurrent executions generated by the verifier to demonstrate the existence of a software defect typically defy one's imagination. A recent report on the application of Spin at NASA, for instance, quotes one programmer as saying: "You guys have discovered [bugs] that we almost certainly would never have caught any other way." 4 The Spin verification system is today one of the most widely used verification tools for distributed systems, with annual workshops and thousands of users in both academia and industry. 27 Even in the unlikely case that no further algorithmic improvements are made in this area, the steady increase in available memory sizes and in the general computing power of machines virtually guarantees continued growth in the size of problems to which software verification tools can be applied. The ideal of a seemingly omniscient box that can magically pinpoint problem areas in a piece of concurrent codevirtually as quickly as the code is written-may not be far away.
