Abstract. The state explosion problem is one of the core bottlenecks in the model checking of concurrent software. We show how to ameliorate the problem by combining the ability of partial order techniques to reduce the state space of the concurrent program with the power of symbolic model checking to explore large state spaces. Our new verification methodology involves translating the given concurrent program into a circuit-based model which gives us the flexibility to then employ any model checking technique of choice -either SAT or BDD-based -for verifying a broad range of linear time properties, not just safety. The reduction in the explored state-space is obtained by statically augmenting the symbolic encoding of the program by additional constraints. These constraints restrict the scheduler to choose from a minimal conditional stubborn set of transitions at each state. Another key contribution of the paper, is a new method for detecting transactions on-the-fly which takes into account patterns of lock acquisition and yields better reductions than existing methods which rely on a lockset based analysis. Moreover unlike existing techniques, identifying on-the-fly transactions does not require the program to follow a lock discipline in accessing shared variables. We have applied our techniques to the Daisy test bench and shown the existence of several bugs.
Introduction
The widespread use of concurrent software in modern day computing systems necessitates the development of effective verification methodologies for multi-threaded programs. However, subtle interactions between threads makes multi-threaded software behaviorally complex and hard to analyze necessitating the use of formal methodologies for their debugging. It is not surprising then that the use of model checking -both symbolic and explicit state -for the verification of concurrent software has recently been an active area of research.
Explicit state model checkers, such as Verisoft [God97] rely on exploring an enumeration of the states and transitions of the concurrent program at hand. Additional techniques such as state hashing for compaction of state representations, and partial order methods are typically used to avoid exploring all interleavings of transitions of the constituent threads. While these techniques are powerful tools for state space reduction, they still do not fully address the scalability issues that arise due to state explosion when model checking large-scale concurrent programs.
Symbolic model checkers, on the other hand, avoid an explicit enumeration of the state space by using symbolic representations of sets of states and transitions. One of the first successful approaches in this regard was the use of BDDs to succinctly represent large state spaces for the purpose of model checking [McM93] . More recently, SAT-based techniques [BCCY99] have become popular both for finding bugs using SAT-based Bounded Model Checking (BMC) and for generating proofs via SAT-based Unbounded Model Checking (UMC).
One of the contributions of this paper is that we have proposed a new methodology to leverage the synergy that results from combining the ability of partial order techniques to reduce the state space of the system to be explored with the power of symbolic model checking techniques to explore large state spaces that has many advantages over existing techniques that attempt to achieve the same goals. Indeed, methods different from ours that combine partial order reductions with the use of BDDs were given in [ABH · 01,LST03]. However, the use of BDDs requires one to first symbolically encode the entire state space of the given concurrent program thereby running into the state explosion problem. Our technique gives us the freedom to use any technique of choice, either SAT or BDD-based. This is crucial as SAT-based BMC techniques tend to be much more scalable on larger programs than the ones based on the use of BDDs.
We start by translating a given concurrent program into a circuit-based (finite-state) model. Building upon the F-Soft framework [ISGG05] for translating sequential programs with bounded data and bounded recursion into circuits, we first obtain a finite model for each individual thread wherein each variable of the thread is represented in terms of a vector of binary-valued latches and a boolean next-state function (or relation) for each latch. Then using a scheduler, we compose the circuits for the individual threads into one single circuit for the entire concurrent program. Verification is then carried out on this circuit. Partial order techniques are incorporated into the framework by statically augmenting the circuit-based boolean encoding of the given concurrent program with additional constraints. These constraints restrict the transitions explored from each global state to a minimal conditional stubborn set of that state.
Another contribution of this paper is that we have proposed a new provably better method for identifying transactions on-the-fly that is based on analyzing patterns of lock acquisition as opposed to existing techniques [Sto02, FQ03] which rely on a lockset based analysis. Lockset based methods for state space reduction essentially exploit the ability of locks to enforce mutually exclusive access to regions of code encapsulated between the locking and unlocking operations on the same lock. They rely on the assumption that the given concurrent program follows a lock discipline in accessing shared variables, i.e., all accesses to a shared variable × are protected by the same lock Ð × [Sto02, FQ03] . Then we can cut down on the number of interleavings that need to be explored by essentially allowing context switches only before the acquire and after the release operations on Ð × and prohibiting them before access to × . Disallowing context switches increases the granularity of transitions and cuts down on the number of possible interleavings resulting in a reduced state space to be explored.
On the other hand, by analyzing concurrent programs for patterns of lock acquisition rather than for locksets, we can identify not only those transactions which lockset based method do but also some that they don't. This makes our new technique provably better. In fact, the lockset based technique for identifying transactions turns out to be a special case of the one based on lock acquisition patterns that we propose here. Moreover, our technique does not rely on the given concurrent program following a locking discipline in accessing shared variables. An important advantage of the non-reliance of our method on lock discipline is that one of the main reasons for the existence of data races in threads is an unprotected/wrongly protected access to a shared variable. The requirement of lock discipline precludes the application of these powerful reductions to programs where such commonly occurring bugs are present. Thus our method enables the use of lock-based reductions for a broader class of concurrent programs, viz., that need not follow lock discipline, to catch a frequently occurring class of bugs.
Another, important feature of the lock-pattern based transactions is that they can be transparently incorporated into partial order reduction by improved conditional dependency detection via addition of extra constraints that are incorporated into the transition relation not a priori but dynamically while unrolling the executions of the threads. We show that the increased granularity of transitions due to transactions can be captured as a reduction in the sizes of the conditional stubborn sets of states.
We believe that our decision to build circuit-based models for concurrent programs gives us many unique advantages. Indeed, in this sense, the work most closely resembling ours are the approaches presented in [RG05, CKS05] that involve translating a program directly into a SAT formula for model checking using SAT-based BMC. However [RG05] does not incorporate partial order reductions and neither technique leverages on-the-fly transactions. Circuit based models make it easy to incorporate static space reduction techniques like partial order reductions, on-the-fly-transactions as well as lightweight static analysis techniques like range analysis to reduce model sizes. Another advantage of our approach lies in the separation of the model building and verification phases. Once we have built a circuit for the concurrent program at hand, it affords us the flexibility to tackle the verification problem using any model checking technique of choice for a broad range of linear time temporal properties, not just safety. Unlike [RG05, CKS05] , we can employ a suite of model checking tools for a rich class of linear-time temporal properties, which can be used both for finding bugs and generating proofs. These include SAT-based BMC and UMC as well as BDD based model checking. We believe this flexibility is important as software generated circuits are not as well structured as hardware circuits and hence no one strategy can be expected to be universally effective. Thus we have presented a new approach for model checking concurrent programs that combines the power of symbolic techniques with partial order reduction and on-the-fly transactions while at the same time retaining the flexibility to employ a broad arsenal of model checking techniques -both SAT and BDD-based -for checking not just reachability but a richer classes of linear-time temporal properties.
In the rest of the paper, Section 2 introduces the system model while on-the-fly transactions are defined in section 3. The details for modeling concurrent programs as circuits are provided in section 4 and the Daisy case study in section 5. Finally, we conclude with some remarks in section 6 along with a comparison with related work.
System Model
We consider concurrent systems comprised of a finite number of processes or threads where each thread is a deterministic sequential program written in a language such as We start by using some examples to motivate our technique. Consider the concurrent program È shown in figure 1. Here x, which is the only variable shared among the threads, is unprotected at control location 5b and protected by lock lk at all other locations. Since x is not protected at all locations where it is accessed, it does not satisfy lock discipline in the sense of [Sto02, FQ03] , which will therefore force a context switch before locations 3a and 3b. Consider, however, a global state × of È with threads Ì ½ and Ì ¾ at control locations 3a and 1b, respectively. The key observation is that starting at global state × of È, 3a does not interfere with 3b and 5b even though 5b is unprotected. This is because for Ì ¾ to execute 3b it has to acquire lk currently held by Ì ½ . But in order for Ì ½ to release lk, it has to first execute 3a. Thus starting at ×, È is forced to execute 3a before 3b. As a result no context switch is required before 3a. However, in the global state × ¼ with Ì ½ and Ì ¾ at control locations 3a and 5b, respectively, the transitions 3a and 5b do interfere with each other thus forcing a context switch before 3a. The bottom line is that even when shared variables do not follow locking discipline globally, we can still identify local portions of the state space where locking discipline is followed. Thus a context driven analysis allows us to define transactions locally on-the-fly where existing methods [Sto02, FQ03] , because of their reliance on a global analysis, fail to do so. Observe that starting at ×, the transitions at control locations 6a and 6b cannot interfere with each other even though they access the same shared variable x. This is because in order for thread Ì ¾ to reach location 6b from 1b it has to traverse the local path 1b,2b,3b,4b,5b, along which it has to acquire (and release) lock lk1 currently held by Ì ½ . In order for that to happen, Ì ½ must release Ð ½ for which it must execute transition 6a. This forces transition 6a to be executed before 6b. Thus no context switch is required before location 6a. The key observation is that even though disjoint sets of locks were held at locations 6a and 6b, it was the set of locks that needed to be acquired by Ì ¾ in order to transit from 1b to 6b ( even though some of these locks were released before reaching 6b) that prevented 6a and 6b from interfering with each other. A traditional lockset based analysis as given in [Sto02, FQ03] would treat 6a and 6b as conflicting transitions (as x does not follow locking discipline) and force a context switch before these locations. Thus a conflict analysis based on lock acquisition patterns is more refined than one based on locksets. Indeed, a lockset based analysis is a special case of lock-pattern based analysis since the set of locks held at a location would have to be acquired and thus would be tracked in the lock acquisition pattern.
Transactions via Persistent Sets. We now show how to integrate lock-pattern based onthe-fly transactions with partial order reduction in a transparent fashion by capturing the increased granularity of transitions due to transactions as a reduction in the sizes of the conditional stubborn sets of states. This is accomplished by ensuring that if in a global state ×, a thread Ì is in the process of executing a transaction, then in the persistent set of ×, we include only one transition, viz., the transition of Ì that fires next along the transaction being executed. This ensures that once the first transition of a transaction is executed, by a thread Ì then no other process can be scheduled unless all transitions of the transaction finish firing. State space reduction using partial order techniques is obtained by exploring from each state only those transitions that belong to a persistent set of that state instead of all the enabled transitions. Although there are many ways to compute persistent sets, the method of computing conditional stubborn sets usually generates those with small cardinality. In this paper, we use standard terminology from the theory of partial order reductions and the algorithm for computing conditional stubborn sets from [God96] , which we denote by Ð Ó ½ . We recall the following definition from [God96] . Thus, using ÐÔ × instead of ×Ø × to compute conditional stubborn sets removes transition 1b from the conditional stubborn set of × thus preventing a context switch before 6a.
Might-be-first-to-interfere
Formally, ÐÔ × is defined as follows. × Ò·½ be a sequence of transitions of such that Ø is dependent with Ø Ò in × Ò . We need to show that at least one of Ø ½ ,...,Ø Ò is in Ì × . Without loss of generality, we may assume that for ½ Ò, Ø is independent with Ø in × and Ø Ò is dependent with Ø in × Ò , else we can pick an appropriate prefix of Û.
Definition (might-be-the-first-to-interfere-modulo-lock-acquisition)
First assume that Ø is disabled in ×. Since Ø is disabled in × and × Ò is the first state along Û in which Ø is dependent (with Ø Ò ), we have that Ø is enabled in × Ò·½ . Since Ø is disabled in ×, either × ℄ , or a condition in guard evaluates to false in ×. In the first case, since Ø is enabled in × Ò·½ , there exists a transition Ø fired along Û, of the form labeled with some guard ¼ . But then executing step 2.(a).i of Ð Ó ¿ , would cause Ø to be included in Ì × . In the second case, there exists a transition Ø , that changes the value of from false to true by changing the output of an operation ÓÔ used to evaluate , i.e., by performing an operation ÓÔ ¼ dependent with ÓÔ in × . Let Ø be the first such transition occurring along Û. Clearly ÓÔ ¼ is statically dependent with ÓÔ. By definition of ×Ø × , we have ÓÔ ×Ø × ÓÔ ¼ , and so Ø ¾ Ì × by step 2.a.(ii). Consider now the case when Ø is enabled in ×. From the facts that (i) for ½ Ò ½, Ø is independent with Ø in × , and (ii) Ø is enabled in ×, we have that for ½ Ò ½, Ø is enabled in × . This implies that thread Ì does not execute any transition along Û, for otherwise since Ì is deterministic, we can conclude that Ø is the first transition that Ì executes along Û. This which would force Ì out of it current local state thereby disabling Ø thus contradicting the above observation. Note that here we assumed that executing a transition takes a process out of its current local state, i.e., there are no self loops in a program thread, a reasonable assumption for software programs Now, since Ø and Ø Ò are dependent in × Ò , it implies that ÓÔ ¾ Ù× ´Øµ ÓÔ ¼ ¾ Ù× ´Ø Ò µ: ÓÔ and ÓÔ ¼ are dependent in × Ò and hence are also statically dependent. Let Ø be the first transition along Û that uses an operation ÓÔ ¼¼ dependent ÓÔ. Note also that there does not exist a lock Ð held by Ì at × such that Ð has to be acquired before Ø is executed along Û. For otherwise, Ð must first be released by Ì thus forcing Ì to execute a transition contradicting our observation above that Ì does not execute any transition along Û. 
Ù Ø
Note that since Ð Ó ¿ computes smaller persistent sets than existing lockset-based techniques, it is guaranteed to improve the performance of explicit state model checkers. Even for symbolic model checkers, since the reduction in the number of scheduled transitions results in a pruning of the state space, it leads to a performance boost which, however, may not be directly proportional to the decrease in the size of the state space being explored.
Software Modeling for Concurrent C Programs

Translating Individual Threads into Circuits
In this section we briefly describe how, using the F-Soft machinery, we first obtain a circuit-based model of each thread, under the assumption of bounded data and bounded control (recursion) (see [ISGG05] for more details). We begin with full-fledged C and apply a series of source-to-source transformations to simplify complex C expressions into smaller but equivalent subsets of C . We flatten all arrays and structs by replacing them with collections of simple scalar variables, and build an internal memory representation of the program by assigning to each scalar variable a unique number representing its memory address. Variables that are adjacent in C program memory are given consecutive memory addresses in our model; this facilitates modeling of pointer arithmetic. We model the heap as a finite array, adding a simple implementation of malloc() that returns pointers into this array. For handling pointer accesses, we first perform a points-to analysis to determine the set of variables that a pointer variable can point to. Then, we convert each indirect memory access, through a pointer or an array reference, to a direct memory access. For example, if we determine that pointer p can point to variables a,b,...,z at a given program location, we rewrite a pointer read *(p+i) as a conditional expression of the form ((p+i)==&a ? a : ((p+i)==&b ? b : ...) ), where &a,&b,... are the numeric memory addresses we assigned to the variables a,b,..., respectively. Nonrecursive function calls are handled by inlining exactly once, and replacing the function return by a set of goto-s conditioned upon the unique call site id stored on function entry. Bounded recursive functions are modeled by introducing a bounded call stack. While we aim for accurate modeling of all C, practical modeling requires making approximations. We truncate large arrays: writes to elements above a certain index are ignored, and reads from these elements yield non-deterministic values. We currently approximate floating-point values by modeling their integral parts only.
The simplified program consists of scalar variables of simple types (Boolean, enumerated, integer). This is compiled using standard techniques into its control flow graph (CFG). The CFG representation can be viewed as a finite state machine with state vector (pc,V), where pc denotes an encoding of the basic blocks, and V is a vector of integer-valued program variables. We then construct symbolic transition relations for pc, and for each data variable appearing in the program. For pc, the transition relation reflects the guarded transitions between basic blocks in the CFG. For a data variable, the transition relation is built from expressions assigned to the variable in various blocks. Finally, we construct a symbolic representation of these transition relations resembling a hardware circuit. For the pc variable, we allocate ÐÓ AE latches, where AE is the total number of basic blocks. For each C program variable, we allocate a vector of Ò latches, where Ò is the bit width of the variable. At the end, we obtain a circuit-based model of each thread of the given concurrent program, where each variable of the thread is represented in terms of a vector of binary-valued latches and a Boolean next-state function (or relation) for each latch.
Building the Circuit for the Concurrent Program
Given the circuit for each individual thread Ì , we now show how to get the circuit for the concurrent program È comprised of these threads. In the case where local variables with the same name occur in multiple threads, to ensure consistency we prefix the name of each local variable of thread Ì with thread i. Next, for each thread Ì we introduce a gate execute i indicating whether È has been scheduled to execute in the next step of È or not.
For each latch l, let next-state ´Ðµ denote the next state function of l in circuit . Then in circuit , the next state value of latch thread i l corresponding to a local variable of thread Ì , is defined to be next-state ´Ø Ö Ðµ if execute i is true, and the current value of thread i l, otherwise. If, on the other hand, latch l corresponds to a shared variable, then next-state´Ðµ is defined to be next-state ´Ðµ, where execute i is true. Note that we need to ensure that execute i is true for exactly one thread Ì . Towards that end, we implement a scheduler which determines in each global state of È which one of the signals execute i is set to true and thus determines the semantics of thread composition.
Conditional Stubborn Sets based Persistent Sets
To incorporate partial order reduction, we need to ensure that from each global state ×, only transitions belonging to a conditional stubborn set of × are explored. Let Ê and Ê denote the transitions relations of È and Ì , respectively. If È has Ò threads, we introduce the Ò-bit vector ×ØÙ which identifies a conditional stubborn set for each global state ×, i.e., in ×, ×ØÙ is true for exactly those threads Ì such that the (unique) transition of Ì enabled at × belongs to the same minimal conditional stubborn set of ×.
The ×ØÙ vector can be computed in the following way: 1. For each shared variable Ü and thread Ì , we introduce a latch touch-now´Ì Üµ which is true at control location Ô of Ì iff Ì accesses Ü at control location Ô . This can be done via a static analysis of the CFG of Ì by determining at which control locations Ü was accessed and taking a disjunction for those values of Ô . 2. For each shared variable Ü and thread Ì , introduce the latch touch-now-later´Ì Üµ, which is true at control location Ô of Ì if Ì accesses Ü at some location Ô ¼ reachable from Ô . Thus computing touch-now-later´Ì Üµ involves deciding the reachability of Ô ¼ , and since we cannot compute it exactly without exploring the entire state space of È, we over-approximate it by doing a context-sensitive analysis of the control-flow graph of Ì . We set touch-now-later-pair´Ì Üµ to true in control Ô if for some control Ô ¼ reachable from Ô in the control flow graph of Ì , Ü is accessed at Ô ¼ . 3. For distinct threads Ì and Ì , the relation ÓÒ Ð Ø ´ µ is then defined as Ü¾Î × ´touch-now´Ì Üµ´Ô µ touch-now-later´Ì Üµ´Ô µµ, where Ô and Ô are the control locations of Ì and Ì , respectively, in the current global state and Î × is the set of shared variables of È.
4. Using a circuit to compute transitive closures, for each , starting with Â we compute the closure of Â under the conflict relation defined above. 5. We build a circuit to compute the index Ñ Ò such that the cardinality of Â Ñ Ò is the least among the sets Â ½ Â Ò . Finally ½ Ò, set ×ØÙ ½ iff ¾ Â Ñ Ò . Note that in the implementation we need to pick only one set with the least cardinality.
Cycle detection We first identify sticky transitions [KLM · 98] for all potential global cycles. We then force a conflict for the process containing the sticky locations with all other processes via the encoding below. Let ×Ø Ý´Ôµ be a predicate evaluating to true iff location Ô has been marked sticky. Then, for global state ×, we define ÓÒ Ð Ø ´ µ = ×Ø Ý´Ô µ ´touch-now´Ì Üµ´Ô µ touch-now-later´Ì Üµ´Ô µµ, where Ô Ñ is the current control location of Ì Ñ in ×. In other words, if Ô is sticky then thread Ì is said to conflict with all other threads.This implies that either a thread Ì , with smaller conflict set Â , would be chosen for the persistent set computation or a full expansion forced.
This reduction is sound, since (as was shown in [KLM · 98]) any cycle in the global state space can be projected on to one or more local cycles in the control flow graph of the individual threads. By forcing a full expansion inside each (potential) local cycle with the help of sticky transitions, we ensure that there is no global cycle such that a thread transition is postponed at each state of the cycle. Therefore this encoding allows the model checker to explore a conservative over-approximation of the representative (minimal) set of interleavings of the given threads. Although the reduced model remains sound, the number of interleavings considered may decrease dramatically with the number of annotated sticky transitions.
So far, we have implemented sticky transitions only for special cases in which cycles can occur locally in threads. In fact, as was noted in [FG05] , our experience also has been that acyclic state spaces are very common in software implementations for the purpose of model checking and cycle detection becomes more critical when one is using an abstraction (which introduces cycles) refinement framework. However since (i) we put a lot of effort in modeling programs concretely, (ii) do not use abstraction refinement, and (iii) introduce sticky transitions to cover common trivial cases, the impact of the existence of cycles is reduced. Nevertheless, we are currently in the process of extending the implementation of sticky transitions to the general case. 
Encoding Lock Pattern based Reductions
The Daisy Case Study
We have used our technique to find bugs in the Daisy file system which is a benchmark for analyzing the efficacy of different methodologies for verifying concurrent programs [dai] . Daisy is a 1KLOC Java implementation of a toy file system where each file is allocated a unique inode that stores the file parameters and a unique block which stores data. An interesting feature of Daisy is that it has fine grained locking in that access to each file, inode or block is guarded by a dedicated lock. Moreover, the acquire and release of each of these locks is guarded by a 'token' lock. Thus control locations in the program might possibly have multiple open locks and furthermore the acquire and release of a given lock can occur in different procedures.
Currently F-Soft only accepts programs written in C and so we first manually translated the Daisy code which is written in Java into C. Furthermore, to reduce the model sizes, we truncated the sizes of the data structures modeling the disk, inodes, blocks, file names, etc., which were not relevant to the race conditions we checked, resulting in a sound and complete small-domain reduction. We have shown the existence of the race conditions described below also noted by other researchers (cf. [dai] ). The efficacy of our techniques can be judged from the fact that our model checking methodology has been able to detect these race conditions in Daisy in a fully automatic fashion directly on the source code without any code structuring/abstractions beyond redefining the constants as discussed above.
1. Daisy maintains an allocation area where for each block in the file system a bit is assigned 0 or 1 accordingly as the block has been allocated to a file or not. But each disk operation reads/writes an entire byte. Two threads accessing two different files might access two different blocks. However since bytes are not guarded by locks in order to set their allocation bits these two different threads may access the same byte in the allocation block containing the allocation bit for each of these locks thus setting up a race condition. Note that the race condition occurs for any pair of blocks with numbers and where ÐÓÓÖ´ µ ÐÓÓÖ´ µ.
The verification statistics are as follows: We ran our experiments on a machine with an Intel Pentium4 3.20GHz processor and 2GB RAM. Each run was given a timeout of 2 days and had a memout of 2GB. Witnesses for the above race condition were found in two cases, ÏÏ ½ -corresponding to blocks 0 and 1, and ÏÏ ¾ -due to blocks 1 and 2. Using purely interleaved scheduling, we failed to find either witness because of a memout at depth 15. When only partial order reduction was employed ÏÏ ½ was found using SAT-based BMC at unroll depth 122 in 36707 sec and 999MB while incorporating on-the-fly transactions drastically reduced the time and memory usage to 1283sec and 122MB, respectively. The second witness ÏÏ ¾ was found at depth 151. Using partial order reduction alone took 145176 sec and 1870 MB, while adding transactions reduced it to 5925 sec and 902 MB.
2. In Daisy reading/writing a particular byte on the disk is broken down into two operations: a seek operation that mimics the positioning of the head and a read/write operation that transfers the actual data. Due to this separation between seeking and data transfer a race condition may occur. For example, reading two disk locations, say Ò and Ñ, we must make sure that × ´Òµ is followed by Ö ´Òµ without × ´Ñµ or Ö ´Ñµ scheduled in between. In this case a witness was found at depth 48. Using partial order reduction alone took 2.99 sec and 5.7 MB while adding transactions reduced it to 2.89 sec and 5.5 MB. For this example also BMC on the completely interleaved model failed to find a witness because of a memout at depth 20
The bottom line is that, for deep bugs techniques that leverage the use of on-the-fly transactions combined with partial order reduction greatly outperform those which use only partial order reduction -both in terms of time taken and memory used.
Concluding Remarks and Related Work
A comparison of our work with [RG05, CKS05] , to which it is most closely related, was presented in the introduction. Partial order reduction has been used before for symbolic model checking using BDDs [ABH · 01,LST03]. On the other hand, by separating the modeling and verification phases, our methodology gives us the ability to combine partial order reductions with any symbolic model checking technique of choice, either SAT or BDD based. An interesting approach for the verification of concurrent programs using proof-guided under-approximation-widening methodology was presented in [GLST05] . Here constraints are added to the BMC model instance so that only a subset of behaviors of the concurrent system are explored. These constraints are iteratively removed during the widening phase as a result of which, in the worst case, one might end up exploring the entire state space of the concurrent program at hand. In contrast, we add constraints so that we explore a conditional stubborn set at each global state thereby yielding considerable state space reduction. Moreover, [GLST05] does not leverage the use of transactions.
There has also been interesting work ([FQ03,Sto02,SC03,AQR · 04,LPQR05]) on the use of lockset based transactions for verifying software and combining it with partial order reductions. These techniques first compute the valid set of transactions in each of the processes and then perform partial order reduction-based state-space exploration. As noted before, such a two-step combination technique may overlook potential reductions related to shared variables which do not always follow a locking discipline. The key reason is that in these approaches a thread-wise global analysis is done to look for potential dependencies between transitions. In contrast, our approach adds information to the model while exploring the state space by detecting dependencies on-the-fly via an analysis of patterns of lock acquisition. Our more refined method generates fewer dependencies between transitions resulting in a lesser number of context switches. This gives us better state space reduction than existing lockset based techniques.
To sum up, we have presented a new approach for verifying concurrent programs that combines the power of symbolic model checking with partial order reduction and on-the-fly transactions while at the same time retaining the flexibility to employ a variety of error trace generation/proof techniques -both SAT and BDD-based -for checking not just safety but a broad class of linear time temporal properties. The use of lock acquisition patterns rather than locksets to identify transactions on-the-fly is not only a powerful technique in its own right but can also be used in a synergistic manner with both explicit state and BDD-based exploration of concurrent programs as also with dynamic partial order reduction techniques [FG05] .
