The aim of this chapter is to give anoverviewofthe theoretical foundation and the practical application of logic model checking techniques for the verification of multi-threaded software (rather than hardware) systems. The treatment is focused on the logic model checker SPIN,w hich was designed for this specific domain of application. SPIN implements an automata-theoretic method of verification. Although the tool has been available for over15years, it continues to ev olve, adopting newoptimization strategies from time to time to help it tackle larger verification problems. This chapter explains howt he tool works, and which types of software verification problems it is designed to handle.
Introduction
This chapter is concerned with the development of automated procedures for the verification of software systems, with particular emphasis on the verification of process interactions in either logically or physically distributed software systems. Several verification tools are available today that can prove interesting facts for a significant class of such systems. An upt to date overviewc an be found on the web. 1 In this description we will focus on SPIN [36] as one of the leading tools in this class. SPIN is distributed freely in source form. 2 Tw o notable trends have contributed to the recent successes of the logic model checkers in the verification of distributed software systems. The first trend is the continuing improvement in algorithms and tool design in this area, which makeitpossible to handle ev erl arger and more complexv erification problems. We will reviewt he main improvements of this type in this chapter.Asecond significant trend is the steady increase in basic compute power,which continues to followthe curvethat was tentatively suggested by Gordon Moore nearly forty years ago [42] . The trends that have turned software verification from a theoretical curiosity into a practical reality are paralleled by similar trends in hardware verification. The difference in the nature of hardware and software, though, makes that there is surprisingly little overlap in the algorithms, data structures, and specific logics that are used in these twof ields. We will discuss some of the main reasons for these differences towards the end of this chapter. The most commonly used method to validate software systems today remains testing. In au nit test, a single process or module of the system is placed in isolation and probed on its functional correctness. Once successful, a series of unit tests is followed by a system integration test. In an integration test multiple units are linked together to form part or all of the envisioned system. The limitations of this method of system validation are as well known as its benefits. For sequential software systems, where one is primarily interested in verifying the computational aspects of a system, the classical testing techniques still have few competitors, event hough much could be done to improve precision and coverage by a more aggressive use of formal methods based approaches. In distributed software systems, the verification task is larger,since nowwedonot just need to worry about computional correctness but also about a range of concurrencyr elated problems that can prevent proper execution. Concurrencydoes not just increase the obligations of the tester or verifier,i ta lso significantly complicates the already existing obligations for demonstrating the correcness of sequential computations. Concurrencyc an introduce race conditions, data corruption, delay,p rocess or thread starvation, or evens ystem-wide deadlock. The unpredictable nature of the interleaving of process executions in distributed systems makes that test executions are not always reproducible. Each single execution is typically only one of a virtually unimaginably large set of possible interleavedexecutions. What is needed to address these problems is an effective method for probing the system for conveniently defined classes of behavior,r ather than isolated instances of behavior.L ogic model checkers promise to provide such a technique, but theytoo comes with some limitations. The 
Figure1-Alternating Bit Protocol
Tw o state machines are defined in Figure 1 , formalizing a sender process and a receiver process. The edge labels specify message exchanges. Each label consists of twocharacters: the first specifies the origin of the message being sent or receiveda nd the second specifies a sequence number for the message. This sequence number,termed the alternation bit in [3] , is either zero or one, and is toggled between the twov alues on each successful transmission. Underlined names represent send actions; the other names represent receives. The double arrows, finally,i ndicate the states where newd ata is to be fetched for transmission by the sender,orreceiveddata is to be stored by the receiver. The protocol starts with sender and receiverinthe states labeled s 0 .T he sender will then transmit a message with sequence number zero ( A0) to the receiver. Ifa ll is well, the receiverw ill receive the message ( A0) and both processes will move tot he states labeled s 1 .T he receiverw ill nowa cknowledge receipt by transmitting B0, and both processes move to the states labeled s 2 .T he same sequence nowr epeats with the sequence number toggled from zero to one. If for some reason, e.g., the loss or duplication of a message, the receiverp rocess sees a data message with the wrong sequence number,i tw ill reply with the matching acknowledgment but not proceed. If the sender receivesa na cknowledgement with a wrong sequence number it will repeat the last transmission and hope for the best. If we abstract from the data being transmitted, we can see that each process in this system can be in no more than six distinct states. The combination of sender and receiver, therefore, can be in no more than 6x6 or 36 distinct system states. In this simple case, a brute force exhaustive enumeration of all reachable states of the system will suffice to establish most of its logical properties. The combined behavior of the system defines a newf inite state automaton, and can similarly to Figure 1 be formalized as a graph. This graph can be constructed and analyzed with a standard depth-first search procedure [48] at a cost that is linear in the size of the graph. Manual techniques for the analysis of state machine models of protocols had been pioneered in the early seventies, e.g. [5] , but not surprisingly these methods quickly proved too cumbersome and too errorprone, as demonstrated in [21] . Although the first automated systems had greater potential, theyw ere mostly restricted to proving only a small set of mostly predefined properties, and would quickly run into seemingly unsurmountable barriers of computational complexity.T he types of properties that could be demonstrated for small protocol models included absence of deadlock (i.e., the absence of reachable states in the global execution graph without successors) and the preservation of system invariants on system states (i.e., the absence of reachable states in which one or more of the required invariants would evaluate to false). In the eighties a more general framework for proving logic properties of finite state models took shape and found general acceptance. The development of automated verification systems has taken a somewhat different path for hardware and for software applications, leading to twod ifferent sets of verification tools that are based on different logics and that exploit different types of search and optimization algorithms. The dominant techniques in formal hardware verification, e.g. [13, 41] , are founded on the early work of Clarkea nd Emerson in the U.S. [11] , and of work by Queille and Sifakis in France [45] . In software model checking, the development can be traced through the early work of Pnueli [44] on temporal logic, to the development of the automata theoretic verification method by Vardi and Wolper in the mid eighties [52, 55] . It should be noted, though, that the newt heories were not immediately of practical use. It took a while for algorithms to be developed that could be implemented efficiently,a nd for desktop machines to provide the required compute power to execute them. It appears nowg enerally agreed that this turning point was reached in the mid to late nineties. With improveda lgorithms and everi ncreasing compute power,t he attention in recent years has shifted from the development of the basic capability to perform logic model checking on hand-built system models towards the automated extraction of verification models from implementation levels ource code. Before discussing these methods, though, we will first covert he basic theoretical framework that underlies specifically the SPIN model checking system.
Finite Automata
In this section we introduce the notion of an automaton and of ω -acceptance, which we use to develop the automata theoretic verification method in subsequent sections. We begin with the definition of an extended finite state automaton. We will be brief here about the definition of 'data objects' and 'actions.'M odel checking languages such as PROMELA [27, 30, 36] give precise semantics to these notions, which guides the operation of the model checker.F or our purposes here, it will suffice to assume that each data object has a unique name and finitely manyp ossible 'values' of arbitrary type. One value in the domain of each object is always tagged as the initial value of an object of that type. Each data object also has a 'current value' that can only be changed through the application (or 'execution') of 'actions' from set L. The intuition is that (the effect part of) an action can only be applied when the guard condition is true. Every transition in the automaton is labeled with an action, which blocks the transition until the guard condition is satisfied and applies the effect when the guard condition is true and the transition is executed.
Definition 3.3.
Atransition is said to be executable if and only if the guard expression from the corresponding action evaluates to true,otherwise it is said to be 'unexecutable' or 'blocking. ' We will use this notion of 'executable' and 'unexecutable' actions belowinthe definition of the 'runs' of a system. As examples of useful data objects, consider the following PROMELA message channel structures. According to PROMELA semantics, these channels are initially empty and can each store one message consisting of twot yped fields [36] . Some actions from PROMELA on the channel s2r are:
The first action has a guard that returns true only when the channel currently stores one message and is thereby filled to capacity.T he effect part of the first action is skip,a null-operation that has no effect. That is: the actions acts as a condition without side-effects. The second action has a guard that returns true only when the channel is non-full. Its effect changes the value of the data object s2r by appending the message A,0,with A of type mtype,and 0 of type bit. The third action has a guard that returns true only when the channel holds a message with the fields B,1;i ts effect part deletes that message from the channel. An extended finite state automaton, as defined, can be represented conveniently by a directed graph with the nodes representing states and the edges representing the transition relation T.T he edges are labeled with actions from L. As ystem liket his is therefore also known as a 'labeled transition system.'O ur aim is to use extended finite state automata to represent process behavior in a distributed system. Set F can be used to mark the normal termination points of a process, or theyc an be used to mark special acceptance nodes in the graph that can servet od efine and check the liveness properties or a system, as we shall describe shortly. The twostate machines in Figure 1 can also be defined as extended finite state automata. The automaton for the sender (on the left side in Figure 1 ), for instance, can be defined as follows, using the twod ata objects that we introduced in the example above and PROMELA syntax for the actions In general, the finite state automata that we will consider can be non-deterministic, e.g., we allowtransitions such that: (v, a, w)∈T ,(v,a,w′)∈T, with w ≠ w′.N on-determinism is an important mechanism for building an abstract model of a distributed system. It can be used to generalize a model and to remove implementation leveldetail [32, 36] .
Automaton Runs
A run σ = t 0 , t 1 , t 2 , ... ,t k of automaton A is a sequence of transitions that satisfies the following conditions: the source state for t 0 ,the first transition in σ ,isalways s 0 ,i.e., the initial state of the automaton,
that is, the run defines a path in the graph of A. Note that a 'run' only defines uninterpreted potential executions of a system, it does not taket he manipulation of data objects through actions into account just yet. Wew ill distinguish between 'valid' and 'invalid' runs in an expanded finite state automaton shortly. According to the classic definition of acceptance a finite run is said to be accepted by A if its final state is in set F,i.e., for run σ with final transition t k ≡ { a, b, c }ifc∈F.I fset F is used to mark the normal termination points of a process then a run will not be accepted by the automaton unless it terminates at such a marked state.
Omega Acceptance
The classic notion of acceptance givenabove applies only to finite runs, i.e., to terminating executions. Looking at the automata in Figure 1 , though, it is unclear if termination should be considered proper behavior or an error.A sl ong as data is available from the unspecified source, the sender process should continue to transmit it to the receiver. If the protocol terminates, we would likeittoterminate in either state s 0 or s 2 ,with the last data message properly acknowledged, but it need not terminate at all. We will therefore define a notion of ω -acceptance that can be applied to both the infinite and the finite runs of an automaton. An infinite run of an automaton is called an ω -run.
Definition 3.4.
An ω -run σ is accepted by extended finite state automaton A if it contains at least one state from set F infinitely often.
The above notion of acceptance is known as B .. uchi acceptance [8, 50] . Fort he automata definition we gav e for the processes in Figure 1 it would suffice to limit the set of accepting states to one of the twostates s 0 and s 2 ,since clearly neither can be visited infinitely often unless the other is too. It is also clear that any ω -run for a finite state automaton will have torepeat states, i.e., it will necessarily be cyclic. We now define the 'stutter-extension' of a finite run to makes ure that the rules of ω -acceptance can be applied equally to infinite and finite runs of an automaton.
Definition 3.5.
The stutter-extension of a finite run σ of finite state automaton A is the ω -run that is derivedfrom σ by appending an infinite number of nil-actions {s k , nil, s k }toit, where s k is the final state that is reached in σ ,and nil is an action with guard true and effect skip.
Asynchronous Product
The combined behavior of asynchronously executing processes in a distributed system can be formalized as a simple product of automata.
Definition 3.6. The asynchronous product of the extended finite automata
That is, the states of the asynchronous product define combinations of states in the individual automata, but the edges correspond to the individual transitions of the two automata: the transitions are interleaved. As an example, the asynchronous product of the twoa utomata from Figure 1 has 6x6 or 36 states. The initial state of that automaton is (s 0 , s 0 ). Set F has four states. Set D contains twodata objects { s2r,r2s }, and set L contains the eight actions { r2s!B,0, r2s!B,1, r2s?B,0, r2s?B,1, s2r!A,0, s2r!A,1, s2r?A,0, s2r?A,1}.
Automata Expansion
Clearly the data objects in an extended finite state automaton also carry state information. We can map an extended finite state automaton to a pure finite state automaton by moving the state information from set D into set S. In effect, this expansion multiplies set S with the set of values of all data objects. To construct a pure automaton we can replicate each state in S, except the initial state, as manytimes as there are distinct combinations of values for all data objects in D. Forthe initial state, the initial value for each data object is used. Each copyofs∈ Shas a copyofall incoming and outgoing transitions of s in the original automaton. Next, we can mark the transitions in this newa utomaton as either valid or invalid, depending on whether the corresponding action from L is executable in that state. Since data values are nowe xplicit, the validity of each transition can be determined unambiguously. Let µ(n)bethe valuation of all data objects in state n,i.e., a finite and ordered set of values, and let γ (l, n)b et he valuation of all data objects in state na fter the effect part of action l is applied.
Definition 3.7.
At ransition {n, l, m}f rom the set of transitions of an expanded finite state automaton is valid if the guard of action l is true for µ(n), and
The expansion process of an automaton is completed by first omitting all transitions that are not valid, and next omitting all states that are no longer reachable from the initial state. Forthe automata in Figure 1 the full set of actions is, assuming ideal full-duplexcommunication between sender and receiver: The complete expansion of the asynchronous product of the twoa utomata, after deleting invalid transitions and unreachable states, has eight states, and permits just one ω -run, as illustrated in Figure 2 . In the product automaton in Figure 2 we can noweasily annotate each state n with its valuation µ(n), giving the explicit value of each data object. 
Figure2-Expanded asynchronous product of automata in Figure1.
Accepting states have double circles.
Temporal Logic
The correctness properties of a distributed system can be formalized in linear temporal logic (LTL), as first proposed by Pnueli in [44] . Anyboolean expression overthe state of asystem and its associated data values will be called a state formula.E very guard from a action in an automaton definition, for instance, is defined by a state formula. In the following, the lower case symbols p, q, r represent state formulae and f , g, h represent temporal formulae, which are defined as follows.
Definition 4.1. Every state formula p is also a temporal formula.
If f is a temporal formula, then so are ¬ f ,(f), and X f . If f and g are temporal formulae, then so are f /\ g, f \/ g,and f Ug. The temporal operator X is pronounced 'next,'and the temporal operator U is pronounced 'until. ' We write ν ( f , s i ) ≡ true to express that temporal formula f holds at state s i .W ecan then define the standard Boolean operators as follows:
The semantics of X and U are defined overanω-run σ .L et s 0 , s 1 , s 2 ,. .., s i , s i+1 ,. .., be the set of states that is traversed in σ .W ecan then define:
The definition of U requires that either g is true nowo rt hat f remain true until g becomes true.I f, however, f remains true invariantly then g is not required to become true.T he operator U is therefore called a 'weak until' operator.T here is also a 'strong until' operator U,which can be defined as follows.
Tw o other frequently used temporal operators can be defined in terms of the operators we have defined so far.T heyare the ,or'always' operator,and the ◊,or'ev entually' operator:
StandardFormulae
Manys tandard types of correctness requirements can be expressed with the temporal operators we have defined here. We giv e twoexamples of commonly used patterns. A progress property is a temporal formula that can be written in the form ◊ p.T his formula states that at anyp oint in an execution the state formula p is either true or it will become true at some point in the future. A guarantee property is a temporal formula that can be written in the form ◊ p.T his formula states that the state formula p is guaranteed to become invariantly true at some point in the future. Progress and guarantee are in manyw ays dual properties. It is, for instance, not hard to showthat ¬ ◊ f ⇔◊ ¬f. Some other equivalences [40] are:
So far we have defined the evaluation of temporal formulae for specific ω -runs. Wew ill be interested in proving properties of a system for all possible executions starting from its initial system state. When we say that f holds for finite state automaton A we mean that it holds for all ω -runs that start from A's initial state. We may,for instance, want to prove that ( p →◊q)for the automaton in Figure 2 , with p and q defined as state properties:
Equivalently,w em ay want to prove that the negation of this formula is not satisfied.
Using the equivalences and the definition of logical implication ( p → q ⇔ ¬ p \/ q)w e can write the negation as:
This formula is satisfied if at some point in an execution the state formula p becomes true while q is false and remains false foreverthereafter.N ote that this indeed captures the violation of the formula ( p →◊q)t hat we started with. Nowc onsider the automaton in Figure 3 .
with initial state s 0 and accepting state s 1 .
This automaton has twos tates s 0 and s 1 ,w ith s 0 the initial state. Set D is identical to those of the automata in Figures 1 and 2 , D={s2r, r2s}. Set Lhas three elements:
true which is an action with guard (true)and effect skip, p /\ ¬ q which is an action with guard ( p /\ ¬ q)and effect skip,and ¬ q which is an action with guard (¬ q)and effect skip.
Set F,finally,has one element: s 1 . The accepting runs of this automaton have the following form, written as a sequence of transitions:
(true) + ;(p/\¬q);(¬q) ω where ; indicates concatenation, + indicates finitely manyr epetitions, and ω indicates infinitely manyr epetitions. Note that this matches the semantics of ◊ ( p /\ ¬ q), and could be useful in automating the verification process. The automaton in Figure 3 need not be discovered by trial and error: there are efficient algorithms for constructing it mechanically from the LTL formula [19, 17, 18, 52] .
Synchronous Product
Howcan we use the automaton from Figure 3 to prove our sample property for the alternating bit protocol, i.e., for the automaton from Figure 2 ? Somehowwemust 'match' the runs of the automaton in Figure 3 with the runs of the automaton in Figure 2 . Wecan do precisely this by computing the synchronous product of these twoautomata. 
That is, the edges of the synchronous product of the automata correspond to joint transitions of the automata. Since every transition nowc arries twoa ctions, the guards of both actions must evaluate to true for the transition to be valid. For a property automaton that is derivedfrom a temporal formula (likethe one in Figure 3 ) the effect part on each action will always be skip,sothe order in which the effects are executed is unimportant. (In general this order will matter,s ot hat the synchronous product AxB may be different from BxA.)
The synchronous product of the automata in Figures 2 and 3 is shown in Figure 4 . Va lid transitions are drawn solid and invalid transitions are dashed. The only valid ω -run in the automaton from Figure 4 contains no accepting states. We can conclude that formula ◊ ( p /\ ¬ q)cannot be satisfied in the automaton from Figure 2 , and therefore that formula ( p →◊q)cannot be violated. 
LTL Model Checking
The set of all ω -runs that an automaton accepts is often referred to as the language that is recognized by the automaton. Let L(M) be the language recognized by the automaton that represents the system behavior we are studying, and let f be a temporal logic formula that is required to be satisfied by the system. The verification proceeds in four steps:
1.
Mark all states in M as accepting states, to makesure that all ω -runs of M are considered.
2.
Compute a B .. uchi automaton B for ¬ f (the negation of f ,capturing all possible ways in which f might be violated).
3.
Compute the language intersection of L(B) and L(M), by computing the synchronous product of B and M.
4.
If the intersection is empty,i.e., if the product automaton accepts no ω -runs at all, M cannot violate f and therefore property f is satisfied. If the intersection is non-empty,there is at least one ω -run that is accepted by both M and B. Because it is accepted by B it constitutes a violation of property f .T he run can be used as concrete evidence that f is not satisfied by M. In this section we will consider howthis automata theoretic method for the verification of LTLformulae can be implemented efficiently.
Depth-First Search
Fora na ccepting ω -run to exist, there must be at least one execution of the product automaton defined above that traverses an accepting state infinitely often. This means that there must exist at least one accepting state in the product automaton that is both reachable from the initial state of that automaton and that is reachable from itself. Fort his to be true the reachability graph for the product automaton must have atl east one strongly connected component with an accepting state. The strongly connected components in a graph can be computed in linear time with Tarjan'salgorithm [48] . The product of M and Balso depends linearly on the numbers of states in the automata M and B. More problematic is, though, that the sizes of M and B can depend exponentially on the problem size. Misgenerally givenasanasynchronous interleaving product of automata. This means that the size of M can increase exponentially with the number of automata (asynchronous processes) that we consider.Bis extracted from an LTL automata, and in the worst case the size of B can also be exponentially larger than the size of the formula, measured by the number of state subformulae in the formula [52] . Fortunately,i np ractice things are not quite this bad. LTLf ormulae of practical interest rarely contain more than twoo rt hree temporal operators, and the automata generated from them have rarely more than fiv e or six states [18] . The reason is simple: the precise meaning of formulae with more than three temporal operators can be hard to determine. The chains of reasoning required to interpret such a requirement quickly becomes too long to be meaningful in systems verification. In almost all cases of interest, a more complexs ystem requirement can be broken down into smaller steps of just a fewb asic types: invariance (expressed as p), inevitability (◊ p), progress ( ◊ p), and conditional response ( (p -> (q U r))) [39, 43] . The size of the synchronous product of M and B is almost completely determined by the size of M, which can indeed be large. Contributing factors to the size of M can be the number of asynchronous processes, and the number and the value ranges of data objects used. A number of techniques have been developed to reduce the size of M, and the cost (in time and memory) of analyzing it. We will reviewt he most important of these here. The most frequently used techniques include model reduction and abstraction, partial order reduction, symmetry reduction, on-the-fly verification, state compression, machine minimization, and proof approximation.
Nested Depth-First Search
First let us briefly revisit the central problem in LTL model checking: detecting the existence of at least one cycle through an accepting state, in a finite graph. In the worst case the algorithm must visit every node in the graph, therefore the complexity cannot be less than linear in the size of the graph. But if the construction of the strongly connected components can be avoided, this problem may be solved with lower overhead than Tarjan'salgorithm. Tarjan'salgorithm stores the nodes of a graph in a single depth-first traversal. Each node is typically annotated with twoi nteger numbers, a lowlink and a depth-first number,e .g. [2] . This requires storing with each node 2xlog(R) additional bits of information, to represent the lowlink and the depth-first number of a node, if R is the number of nodes in the graph. In practice, with R unknown, one typically uses two3 2-bit integers to store this information. Wew ill explore an alternative method that allows us to solvet he cycle detection problem while adding just twobits of information to each node. We begin by discussing a simple algorithm for a restricted class of ω -properties, i.e., proving the absence or existence of non-progress cycles in a finite graph [26, 27] . The algorithm works by splitting the depth-first search into twop hases with the help of a two-state demon automaton. We then continue with a discussion of a similar but stronger two-phase search algorithm that can be used to prove the absence or existence of acceptance cycles (accepted ω -runs), so that it can be used to perform LTL model checking [14, 29] .
Figure5-T wo-state non-deterministic demon automaton
for detecting non-progress cycles. This is done by the addition of a two-state demon D, as illustrated in Figure 5 . The demon can non-deterministically decide to move from its initial state s 0 into an alternate state s 1 ,w here it will then stay forever. The label on this transition has guard true and effect skip.W ea ssume that some of the states in the automaton M to be analyzed are marked 'progress' states. We will be interested in finding any ω -run that contains only finitely manysuch progress states. This corresponds to solving the model checking problem for LTL properties of the type ◊ np,with np apredefined state property that is true if and only if the system is not in a progress state. We compute the asynchronous product of M and D, and perform a slightly modified depth-first search in the reachability graph for that product. The product machine will be at most twice the size of the original M, containing one copyofeach state with the demon in state s 0 ,and possibly one more copywith the demon in state s 1 Note that we do not consider anysuccessors of progress states when the demon is in state s 1 .E very cycle in the second state space (with the demon in state s 1 )i st herefore necessarily a non-progress cycle.
Proper ty 5.1. If non-progress cycles exist, dfs_A() will report at least one of these.
Proof.
Suppose there exists a reachable state that is part of a non-progress cycle, i.e., it can be reached from itself without passing through progress states. Consider the first such state that is entered into the second state space (upon the transition of the demon automaton into its alternate state), and call it r. State r is reachable from itself in the second state space and must find itself in the depth-first search below r unless that search truncates at a previously visited state outside the current search stack. Call that state v.W eknowthat r is reachable from v (or else it would not block r from reaching itself) and that v is reachable from r.T his means that v is reachable from itself in the second statespace via r.T his, however, contradicts the assumption that r wasthe first state such state entered into the second state space. This means that r either revisits itself or a successor of r revisits itself before that happens. In both cases the existence of a non-progress cycle is reported. q.e.d. Wheneverac ycle is detected, the corresponding ω -run can be reproduced exactly from the contents of the stack: it will contain a finite prefix of non-repeated states, and a finite suffix, starting at the state within the stack that was revisited, with only non-progress states. To implement the algorithm it is not necessary to store twof ull copies of each reachable state. It suffices to store the states once with the addition of twobits [20] . The first of the twob its records if the state was encountered in the first statespace, and the second bit records if the state was encountered in the second statespace. Initially both bits are off. We can encounter only the bit combinations 01, 10, and 11, but not 00. (Note that the state is neither present in the first nor the second statespace when the bit combination is 00.) Note that states may be either encountered first in the second statespace, and later in the first statespace, or vice versa. One bit, e.g. to record only the state of the demon automaton, therefore would not suffice. The second of the twobits is always equal to the state of the demon automaton, which therefore need not be stored separately. This non-progress cycle detection algorithm was first implemented in 1988 in the tool sdlvalid,t he immediate predecessor of the SPIN model checker [25] , and later incorporated also in SPIN [26, 27] . A stronger version of this type of two-phase search algorithm was introduced in [14] , and can be used to solvet he general LTL model checking problem. This algorithm is known as the nested depth-first search. The search tries to locate at least one accepting state that is reachable from itself. The demon machine movesonly from accepting states and the move isexplored only after all successors of the accepting state have been explored (i.e., in postorder). It is nown o longer sufficient for the second search to find anystate within the depth-first search stack, we must require that the seed state from which the second search was initiated itself is revisited. The proof of correctness for this version of the algorithm is as follows [14] .
Proper ty 5.2.
If acceptance cycles exist, dfs_B() will report at least one of these.
Proof.
Let r be the first accepting state reachable from itself for which the second search is initiated. State r cannot be reachable from anys tate that was previously entered into the second state space. Suppose there was such a state w.T ob ei nt he second state space w either is an accepting state, or it is reachable from an accepting state. Call that accepting state v. If r is reachable from w in the second state space it is also reachable from v.B ut, if r is reachable from v in the second state space, it is also reachable from v is the first state space. There are nowtwo cases to consider.E ither (a) r is reachable from v in the first state space without visiting states on the depth first search stack, or (b) it is reachable only by traversing at least one state x (cf. Figure 6 ) that is on the depth first search stack. In case (a), r would have been entered into the second state space before v,d ue to the postorder discipline, contradicting the assumption that v is entered before r.I ncase (b), v is necessarily an accepting state that is reachable from itself, which contradicts the assumption that r is the first such state entered into the second state space. State r is reachable from all states on the path from r back to itself, and therefore none of those states can already be in the second statespace when this search begins. The path therefore cannot be truncated and r is guaranteed to find itself in the successor tree. q.e.d. Like dfs_A,this algorithm requires no more than twobits to be added to every reachable state in M, so the overhead remains minimal. As ignificant advantage of this method of model checking is also that the entire verification procedure can be performed on-the-fly: errors are detected during the exploration of the search space, and the search process can be cut short as soon as the first error is found. It is not necessary to first construct an annotated search space before the analysis itself can begin. We can check non-progress properties with algorithm dfs_B by defining the temporal logic formula ◊ np,w ith np equal to true if and only if the system is in a non-progress state. The automaton that corresponds to this formula is a two-state automaton shown in Figure 7 . To perform model checking we can nowt aket he synchronous product of the automaton in Figure 7 with a system M, and use algorithm dfs_B to detect the accepting ω -runs. We thus potentially incur twodoublings of the search space: one due to the nested search inherent in dfs_B and one due to the product with the property automaton from Figure  7 . The earlier algorithm dfs_A solves this specific problem more efficiently by incurring only the doubling from the demon automaton. The advantage of dfs_B is that it can handle anytype of LTL property,not just non-progress properties.
Adding Fairness
LTLisrich enough to express manyfairness constraints directly,e.g., in properties of the form ( trigger) → (◊ response)or( guard(t) ≡ true) → (◊ effect(t)), where t is a transition. More specific types of process fairness can also be predefined and incorporated into amodel checking algorithm. Recall that the asynchronous product of finite automata that is the ultimate subject of LTL model checking is built as an interleaving of transitions from smaller automata, We can include weak fairness into the nested depth-first search algorithm by using Choueka'sflag construction method [10] . The following informally describes the method that is implemented in the SPIN system [30, 36] . We multiply the state space k + 1times, with k the number of component automata in the asynchronous product. Only the first copyo ft he state space retains its acceptance states; the corresponding states in the k additional copies are made non-accepting. Next we change every transition contributed by component i in the original product into a transition from the source state of that transition in copy i of the newp roduct to the destination state of the transition in copy i + 1. Forthe last copy, k + 1all transitions move back to the first copyofthe state space. Any accepted ω -run in the newunfolded state space nownecessarily includes transitions from each of the k component automata. We hav e to makeone further adjustment to this procedure to account for the fact that a component automaton that is permanently not enabled after some point in the run (strong fairness) or a component automaton that is repeatedly not enabled (weak fairness) need not participate in the run. Toi mplement weak fairness, for instance, we can add a null transition from every state s in copy i to state s in copy i + 1ifcomponent i is not enabled at s. Unfolding the state space k times can be costly,b ut we can reduce the memory cost to a minimum by storing each copyo fas tate just once, and annotating it with k + 1b its to record in which copyofthe state space the state has been encountered. If we use the cycle detection method from algorithm dfs_A or dfs_B the memory overhead per reachable state remains limited to 2(k + 1) bits.
SPIN'sOn-the-FlyImplementation
The model checker SPIN performs the LTL model checking procedure on-the-fly,a pplying the nested depth first search algorithm dfs_B during the construction in a single pass of the product Bx(
where B is the property automaton for the negation of an LTL formula that should be satisfied, and where x indicates synchronous product, and × asynchronous product. The construction is optionally modified for Choueka'sflag construction to enforce weak fairness. SPIN derivest he automaton B from an LTL formula using the algorithm from [19] with some optimizations from [18] . Optionally,t he user can also specify B manually,a nd thus gain an increase in expressive power to the full range of ω regular properties. Alternatively,S PIN also allows the use of the conversion procedure from [18] , which adds existential quantification to LTL and thereby also extends the expressive power to the ω regular properties.
The advantage of the on-the-fly procedure is that the construction of the product automaton can stop as soon as an accepting ω -run is found, having delivered proof that the system can violate the requirement. If a system contains an error it usually suffices to construct only a small portion of the product automaton. If the system satisfies the requirement the complete product must be computed. In manyc ases, though, the property automaton B acts as a constraint on the system, limiting the synchronous product to the executions that are relevant to the property being proven. Therefore, in those cases computing the product Bx(
Partial Order Reduction
The validity of an LTL formula is insensitive tot he precise order in which independent transitions from different component automata are interleavedi na ny giv en ω -run of the global automaton. SPIN uses partial order reduction to exploit this fact and to reduce the cost of a typical verification. Instead of generating a full asynchronous product that captures all possible interleavings of transitions, the model checker generates a reduced product, with only a fewr epresentativesf rom each class of ω runs that are indistinguishable for a givenL TL formula [28, 29] . This reduction can, in the best case, reduce the cost of a verification by a factor that grows exponentially with the number of component automata that are used to construct the asynchronous product. In effect, by applying partial order reduction rules one can achieve that every ω -run that is inspected by the model checker represents a large class of equivalent runs. If at least one run from each equivalence class is considered, all other runs can be ignored. The correctness of the reduction algorithm used in SPIN wasverified independently with a theorem prover [9] . Partial order reduction can also be combined with other types of reduction to increase the benefits in some cases, for instance by exploiting possible symmetries in a model, e.g. [16] .
Compression Techniques
Memory and time are bounded resources. The challenge in the construction of practical model checking tools is to economize the memory requirements without incurring unrealistic increases in runtime requirements. The model checker must be able to determine at each newly generated state from the global product automaton whether or not the state already appears in the state space (named the set visited in algorithms dfs_A and dfs_B above). Todosoone typically stores the states in a hash- The most effective compression method SPIN supports avoids storing the set visited completely,a nd instead computes a minimized finite automaton that can recognize (optionally compressed) memory images of states as finite words overapredefined alphabet. Toadd a state, the automaton is updated in a way that secures its continued minimality [35] . The technique is comparable to techniques based on the use of binary decision diagrams that have proveneffective inapplications of model checking in hardware circuit verification, e.g. [7, 13] .
Bitstate Hashing
Model checking can be computationally expensive,e venwith aggressive use of compression and reduction techniques. Large problem sizes can easily defeat the available bounds on memory use and compute time. In cases liket hese it can be of great value to be able to approximate the answer to a verification problem with an accuracydepends on the ratio by which the problem size exceeds the available resources. We can use lossy compression methods to address this problem. Ag ood example of such a method is the bitstate hashing, or supertrace algorithm [24, 31, 36, 56] . This algorithm uses a fixed number of bits of memory per reachable state. The addresses for each of these bits are computed as hash values from the full memory image of a state, with statistically independent hash functions. In the current versions of SPIN the number of bits used can be chosen arbitrarily by the user,b ut it defaults to two. An elegant theoretical explanation of the working of bitstate hashing can be based on the theory of Bloom filters [4] . As hort description will suffice for the purposes of this chapter.Amore detailed account can be found in [15, 36] . Suppose M bytes of main memory is available to store the set of visited states. On av erage workstations M is typically 2 30 bytes, and likely to increase in coming years. We nowc ompute N independent hash values of log 2 8M bits for each state (assuming 8 bits per byte). Instead of storing N log 2 8M bits, though, we interpret the N hash-values as a bit-address in M,and store only one single bit at each address (by changing that bit from 0t o1 ). If the (compressed) memory image of a state is longer than N log 2 8M bits, this method will lose information. It is nowp ossible that twod ifferent states generate the same N bit addresses. The model checker will then assume that a newly generated state matches a previously visited state and fail to generate the successors of the news tate. Because only visited states are stored in this way,a nd not state information from the depth-first search stack (sometimes called open states or stack states), the ommissions due to hash-collissions can cause the model checker to miss error states, but it cannot cause it to generate false error reports. By virtue of the accuracyo fa ll information savedo nt he depth-first search stack, the depth-first search is guaranteed to proceed correctly,g enerating only accurate complete execution sequences, though perhaps not all of them in the presence of bitstate hash collissions. The coverage of the search is truncated randomly,b yaf actor that depends on the amount of information that is lost in the hashing. Because it is impossible to predict systematically which execution sequences of a model might lead to error,a nu nbiased random truncation of the search space turns out to be a states, then the problem coverage can be no more than 0.1, meaning that no more than 10% of the reachable states are visited. Under the same system constraints, the bitstate hashing algorithm, storing 2 bits per state, can record up to 4 states per byte and could still achieve close to 100% coverage, giventhat M/R >> 4 .I ngeneral, when M < R × S, abitstate hashing technique almost always realizes greater problem coverage than a standard model checking run [31] . Since its first introduction in 1987 the bitstate hashing method has become a trusted technique that was adopted in almost all academic and commercial verification tools to deal with problems that exceed the normal bounds for exhaustive verification. The bitstate hashing technique allows the user to set the range of bit addresses that can be used, typically matching the maximum amount of memory that is available for a verification run. Clearly,the larger this hash-array,the more states can be stored in it, the larger the coverage will be, and the longer it may taket oc omplete the verification. This gives the user additional control overt he search process: by artificially limiting the available hash-array,t he user can obtain a very fast and very coarse approximation. By slowly increasing the size of the hash-array,t he coverage and the runtime expense can be increased in a controlled manner.Each increase in coverage that fails to locate errors also increases our confidence in the likely correctness of the system. When the system contains errors, it usually takes only a small number of approximations to locate representative samples. Only when the system is correct, in the last phase of a design, more significant resources need to be invested to prove it. This iterative search refinement technique wasused in the verification of the call processing software for a telephone switch [34] .
Model Extraction and Abstraction
The construction of models of real-world applications for the purposes of verification is hardly novela nd assuredly not restricted to the field of distributed software. There is a long tradition of the use of physical and mathematical models in civil engineering, and scientific disciplines likephysics or chemistry would be almost unthinkable without theoretical models that attempt to capture aspects of nature. Am odel is always an abstraction: by abstracting from detail deemed immaterial to properties of interest we lose scope butgain analytical power. It is well known that evensimple properties of arbitrary software are undecidable or only semi-decidable [51] . This means that model construction for the verification of distributed software is not just an option: it is a necessary step. By defining an abstraction we can reduce a givens oftware application to a finite model, consisting of finite state automata, that can be analyzed with the procedures outlined in this chapter.T his reduction will bring a loss of information, so it has to be chosen in such a way that relevant information is preserved and irrelevant detail removed. What is relevant and what is not depends on the properties that we are interested in proving.
We can remove the detail in such a way that the soundness of the model checking procedure itself is not endangered [1, 6, 12, 38] . This means that if the model checker indicates that a model satisfies a property,t he original software necessarily also satisfies the property.C onversely,i ft he model checker generates an error,t he error sequence can be checked against the original software to determine its validity.I fi ti sv alid, an error in the application has been exposed. If it is not valid, we have obtained proof that the abstraction was chosen incorrectly,and the error sequence itself can be used to determine unambiguously howthe abstraction should be revised.
As imple abstraction method of this type was used in the application of the SPIN model checker to complexc all processing software for a commercial telephone switch, e.g., [33, 34] . With this method, a parser automatically extracts an annotated control flow skeleton from the source code of the application. A lookup table defines precisely which statements from the program should be omitted from the model (i.e., replaced with skip statements), which should be abstracted with user-defined functions, and which should be preserved within the model. Within the model, every statement from the original source code of the application is mapped into a finite domain and represented as a transition in an extended finite state automaton, expressed in guards and effects that operate on finite data objects, often with a reduced range of possible values. Function calls have tob et reated with some care, in the interest of controlling the complexity of a model. The very presence of a function call, however, can be taken as a hint from the programmer that an abstraction can be made. In the call processing application function calls were treated likes tatements: theyw ere either omitted if the functionality provided was outside the scope of the verification, or theyw ere abstracted, either with a small inline routine or with a non-deterministic choice of the possible return values. As an example, a routine that determines the availability of a resource, likeat one circuit, is best abstracted with a non-deterministic choice between the twop ossible result values: available or not-available. No useful gain is made if we were to include more detail than this. As another example, the call of a routine that issues billing records can be omitted from the model if billing is not the focus of the verification.
In the call processing application the focus was on the verification of correct feature behavior for the telephone switch. Requirements for overt wenty different feature packages, such as call waiting, call forwarding, call screening, conference calling, etc., are specified in Telcordia standards for call processing [49] . Each relevant property was formalized in linear temporal logic. Aspects of the system that were outside the scope of our verification effort (e.g., billing, process management, memory management, device driver code) was mechanically omitted from the model, and helped to reduce the cost of the verification of the remaining aspects of the code. The advantage of this method is that it can be almost completely automated. When a new version of the source code is prepared, the model extraction program can prompt the user to provide missing and redundant entries in the lookup table. Once the lookup table has been updated, the model checking process can be repeated with a newa ccurate model being extracted from the source code of the application mechanically,typically in a fraction of a second. The verification method allowed us to track the evolution of the call processing code for this application overap eriod of 18 months. The source code for the application grew fivefold in size in this period, and went through approximately 300 different versions, often changing daily.A pproximately 75 critical errors were intercepted with the model checking technique we have outlined, at an early stage of the design, giving a clear indication of the considerable power and value of software model checking techniques. Manyo ft he errors found involved subtle race conditions in the code that could disturb required functionality.Such errors are virtually impossible to find with conventional testing techniques.
Other Uses of Abstraction
The model extraction method sketched above was greatly facilitated by a relatively recent extension of the SPIN model checker that allows for the inclusion of embedded C code inside higher levelv erification models [36] . This capability to use verify embedded C code fragments can be used in a number of other ways to increase the power of the model checking approach. In [37] a method is described that allowSPIN to verify a code module at implementation level, by compiling it with, and linking it to the model checker.T he model checker nowgenerates the non-deterministic input sequences for the code module and keeps track of the code'ss tate. Toa chieve this, the user identifies the concrete data objects inside the code module that contain state information. The user can at this point also define an abstraction function, in C code, that takes the concrete representation of the state information and abstracts it for use by the model checker.I nt his way we can use SPIN to combine model abstraction without model extraction, which may prove tob ea very effective technique for handling large verification problems in years to come.
Perspective
The software model checking techniques that we have reviewed in this chapter are based on finite automata, linear temporal logic, depth-first search, partial order reduction, and explicit state representation combined with powerful memory management techniques. It has been successfully applied in manyd omains, but typically to verification problems that involvea synchronous threads of computation from software systems e.g., [46] . An overviewofapplications can also be found in [30] .
It is interesting to compare the general framework used in SPIN with the one that has been developed for hardware circuit verification. The most commonly used logic in hardware verification is the branching time logic CTL [11] , the search strategy is often breadth-first, instead of partial order reduction techniques one uses BDD based algorithms [7] , and instead of explicit state representation one uses symbolic model checking [41] . These differences in approach to the verification problem can be understood better if we look at some of the differences between the twodomains of application. Hardware is typically clock-driven, operating in a synchronous fashion, while the processes in a distributed system are necessarily asynchronous. At the hardware leveli nformation travels as signals, in software applications the information is represented, manipulated, and movedi nc omposite data structures. Ab it levelr epresentation is clearly not helpful for these types of objects. The structure of a hardware system, finally,c an often be defined statically,w hile in a software system one must deal with dynamically growing and shrinking numbers of asynchronous processes and data objects. These differences mean that fewofthe reduction techniques that work well in software model checkers showbenefit when used in hardware model checkers, and vice versa.
