Abstract
Introduction
Formal verification and validation methods provide a high degree of confidence of the correctness and reliability of software systems. At the design phase of the software life cycle, automated techniques, e.g. model checking [14, 37] , can be applied to verify that the preliminary high-level design specification conforms to the requirements. The desired requirements and the design specification are represented, respectively, in temporal logic and in high-level modeling languages; their common aspect is that the semantics of both are expressible as finite state automata (regular languages). Finally, model checking, essentially defining a satisfaction relation, involves automata intersection and verifying that it accepts the empty language.
Once a correct design is obtained, the next phase is to develop the corresponding implementation. However, due to the presence of rich classes of data (e.g. infinite domain integers) and control structures (e.g. recursion), model checking source code behavior, in general, is undecidable [39] . Over the last decade, model checking techniques have been fine tuned [15, 17, 29, 33] to reason about infinite state systems. Recent breakthroughs in model checking source code [1, 20, 22] rely either on smart abstraction techniques or on application of constraint solvers to handle infinite domain data present in the implementation.
Formal verification of design is a useful step as it increases confidence that the implementation will perform correctly. Similarly, verification of the implementation (if possible) increases confidence. However, these separate verifications do not ensure that the implementation is a correct representation of the design as the design requirements may not be imposed directly on implementation and vice versa. This inherent gap between ensuring correctness of design and ensuring correctness of implementation is further complicated in practice since designs tend to change over time, even after implementation of the design has begun. Furthermore, this type of separate verification does not allow for verification of a component implementation against a global high-level design model, as might be desired before the entire system has been implemented. As software systems become increasingly multiple-process and multiple-thread, designs and implementations become more complex and the problem only worsens.
In this paper, we present a roadmap where highlevel designs with multiple components are verified against implementations of some (not all) of its components. We proceed with the discussion of the prior research as it relates to current objective (Section 2). Section 3 presents the overview of our approach. The central theme is to verify local properties of implementation of a component and identify a set of conditions that must be imposed on other components (i.e., the environment) for its correct functionality. These conditions are referred to as the obligation of the environment. The global design model is then verified against the requirements and its obligations imposed by the component implementations. Sections 4 and 5 discuss the current work and future avenues of research, respectively, towards this objective.
Background and Related Work

Model Checking a High-level Model
Model checking [14] is the process of verifying that a finite state machine satisfies a desired property, which is usually specified in terms of some temporal logic, e.g. computational tree logic (CTL) [5] or linear temporal logic (LTL). The finite state machines that describe the dynamics of system behavior are not specified by hand; instead, a high-level modeling formalism is used, and the necessary structures to perform the model checking are constructed automatically as needed. In particular, the underlying finite state machine can be constructed from the high-level model automatically if necessary. Theoretically, any highlevel description that has a formal set of rules for constructing the underlying finite state machine will work, including source code (with restrictions). To aid in discussion, in Section 2.1.1 we briefly review one such type of high-level formalism, namely Petri nets, which are one of the well-accepted modeling formalisms that satisfy the above requirement. High-level models often suffer from the state explosion problem: while a realistic system may be described compactly using a high-level formalism, its underlying finite state machine can be extremely large. This leads to practical difficulties for model checking. Researchers have attacked this problem in various ways; some techniques that have received widespread acceptance are described in Section 2.1.2.
High-level Models of System Design
Petri nets.
Petri nets [36] are a graphical design tool with simple yet powerful underlying rules; these properties have contributed greatly to their success. Formally, a Petri net is a directed, bipartite graph consisting of place nodes and transition nodes. Usually, places are drawn as circles and transitions are drawn as bars. Each place may contain a non-negative number of tokens. The state of the Petri net, known as its marking, is given by the numbers of tokens in each place. Place p is called an input place to a transition t if there is an arc from p to t, and is an output place if there is an arc from t to p. Changes of state are dictated by the transitions: a transition is enabled if all of its input places contain tokens, and an enabled transition may fire by removing tokens from its input places and adding tokens to its output places. Weights can be used on the arcs so that transitions can add and remove more than one token from a place. Since Petri net tokens are indistinguishable, abstraction is usually required when dealing with data (e.g., for software). Alternatively, colored Petri nets can be used, which allow tokens to be distinguished by "colors" that are commonly used to represent data [21, 30] . Theoretically, any colored Petri net can be converted into a (usually much larger) uncolored Petri net, a process known as unfolding.
Symbolic Model Checking
Within the model checking literature, techniques based upon the use of decision diagrams have come to be known as symbolic or implicit methods. To use these methods, states of the finite state machine are assumed to be composed of K state variables, with K > 1. A set of states can be represented using a multi-valued decision diagram (MDD) [31] , a directed acyclic graph containing terminal nodes 0 and 1, and non-terminal nodes which are labeled with one of the state variables x k and contain pointers to other nodes for each possible value of variable x k . Binary decision diagrams (BDDs) [6] correspond to the special case where each state variable can assume only values 0 and 1. Rules for construction and manipulation of BDDs and MDDs can be found in [6] and [31] . While MDDs require an exponential number of nodes in the worst case, they have been used successfully in practice to compactly represent large sets of states.
To use symbolic methods for model checking, it is also necessary to use a compact representation for the finite state machine. Traditionally this has been done using MDDs with 2K state variables to represent the transition relation of the finite state ma-
is contained in the MDD representation of the transition relation, then there is a transition in the finite state machine from state (x K , . . . , x 1 ) to state (x K , . . . , x 1 ). The MDD for the transition relation is constructed directly from the high-level model, thus avoiding the need to explicitly represent the finite state machine. More sophisticated techniques are used in practice (e.g., partitioned transition relations [7] ) but are based on similar concepts.
Symbolic methods have been used quite successfully for verification of synchronous and asynchronous circuits (e.g., [7, 8, 38] ), where BDDs are a natural fit due to the boolean state variables. For systems whose state variables are not necessarily boolean, such as manufacturing systems, software systems, and communications systems, BDDs can still be used by describing each state variable as several boolean variables (e.g., [25] ), or MDDs can be used [11, 35] .
Model Checking Implementations
Sequential Program Verification
Given a set of correctness assertions, model checking the source code of a sequential program is typically based on the three step paradigm of abstraction, verification and refinement [13, 19, 23] . The first step involves building an over-approximate abstraction P of the program P . Abstraction is necessary to handle operations on infinite-domain variables present in the system. Thus, source code verification is concerned with handling programs with infinite data behavior, but control is still assumed to be finite. In the second step, the abstract model P is verified against the desired property. In the event that the property is violated, a counter-example is generated. A counterexample in P , however, may not be a valid sequence of steps in the concrete system P owing to the abstractions performed on the variables. The steps used to obtain the counter-example will produce a set of constraints, which are fed into a constraint solver to determine if the constraints can be simultaneously satisfied or not. If the counter-example is indeed infeasible in the concrete system, P is refined (the third step) and the three steps are repeated until a feasible counterexample is obtained or the property is satisfied.
Multi-threaded Program Verification
Multi-threading results in concurrent execution of a program segment by two or more threads. The different possible thread interleavings add a new dimension to the complexity of program verification, as the property must be checked for all possible interleavings. Infinite domain data is typically reduced to a finite domain, and control flow graphs are employed to represent behavior of each thread in a multi-threading environment. A finite state model is generated from a multithreaded program by applying predicate abstraction (Magic: [10] ) and/or program slicing (Bandera: [22] ). While [10] uses weak simulation to decide the conformity of the model to its property, [22] feeds the model to well-established finite state model checkers such as Spin [26] or SMV [9] .
Existing Model Checking Tools
The tools available for model checking focus primarily on verification of high-level models or on analysis of source code. In other words, the tools verify the design model or source code but not a combination of both. For example, the tools Spin [26] and SMV [9] both take as input a high-level model, described using the Promela and SMV input language respectively, and verifies the model against desired properties written in temporal logic. On the other hand, source code verifiers like Slam [1] and Blast [24] perform automatic abstraction of programs to generate finite state models and uses constraint solvers to check for satisfiability of assertional properties. Bandera [22] verifies Java source code by translating the source code to finite state models that can be used as inputs to Spin and SMV. The central theme in all these tools is that either design model or the implementation is available in its entirety. We are not aware of any model checking tool that can verify implementations against a high-level design model, in particular when implementations of some, but not necessarily all, of the components are realized. Such a tool is necessary to bridge the gap between ensuring correctness of design and ensuring correctness of implementation in a realistic setting (i.e., where design and implementation change over time).
Model Checking Hybrid Models
We start with a high-level design model that is divided into components. Specifically, we consider a high-level design model M , consisting of n component models denoted by M 1 , M 2 , . . . , M n , and a desired property ϕ. Using techniques discussed in Section 2.1, a high-level model checker can be used to check the high-level model M against the property ϕ, which will result either in proof that M conforms to the desired property ϕ (typically written as M |= ϕ), or a generated counter-example witnessing the violation of ϕ.
As an example, consider a multi-process system containing writer and reader processes with a shared resource. Each writer process will perform some computation, and then modify the shared resource; reader processes will access the shared resource read-only (i.e., without modification) and perform some computation. For the system to function correctly, at most one writer process should modify the resource at a time, and no reader process should access the resource while it is being modified. A Petri net model for this system is shown in Figure 1(b) , with components for a single writer and reader process shown in dashed rectangles. Multiple reader or writer processes can be represented by duplicating the dashed rectangles and using extra inhibitor arcs. Now suppose that some, but not all, of the components have been implemented, and we wish to verify local properties of the implemented components and validate the result against the global model. We refer to the implementation of component i as I i . Note that implementation I i can have certain local properties, denoted by ψ i , that it must satisfy for its correct functionality. Source code verification techniques (abstraction-refinement and/or slicing), described in Section 2.2, can then be applied to check whether I i |= ψ i . However, it is likely that the correct behavior of component i depends on its environment (the remaining components). As such, we must identify the conditions or constraints imposed on the environment required to satisfy the goal of I i |= ψ i .
Example implementations for the writer and reader processes are shown in Figure 1 (a) and Figure 1(c) . This implementation uses shared integer variables writers and readers, and a semaphore lock used to synchronize modifications to the shared variables. Note that variable writers counts the number of writer processes modifying the resource, and variable readers counts the number of reader processes accessing the resource. The code is designed to allow for multiple writer or reader processes, each executing its own copy of the code shown in the figure.
The long-term goal of this work is to develop model checking techniques to verify properties against wards this goal, while Section 5 outlines our ongoing efforts to achieve this goal.
Current Work
Counter-example Analysis
In the domain of program verification, we proposed efficient solution mechanisms for reasoning about recursion [2] and for analyzing multiple counterexamples by ascertaining dependencies of variables at various program points [3, 4, 32] . Counter-example analysis is performed by identifying a slice of counterexample, execution of which and nothing else causes the erroneous behavior. The sequence of variable operations present in this slice is sufficient to prove or disprove the feasibility of the corresponding counterexample. Dependency analysis of variables is a program analysis technique widely deployed in the field of software engineering, for debugging and testing programs [18, 27, 40] . The essence of the technique is to perform dynamic program slicing [16, 28] to identify (a) constraints on variable valuations and (b) semantically dependent, possibly noncontiguous program segments (for debugging potential errors in programs).
Such a counter-example slice is referred to as a focus statement sequence (FSS). A statement in a counterexample is classified as a focus statement (a member of the FSS) if and only if it directly or indirectly affects the violation of an assertion. An important aspect of this work is that it can automatically identify the constraints over input variables (the variables used before before being defined ) necessary for the feasibility of a counter-example. Negation of such constraints, referred to as assumptions in [4] , leads to removal of the corresponding counter-example. In short, the pre-conditions over input variables required for correct functionality of the program can be identified by complementing the assumptions (obligation of the environment of the program).
The above projects form the starting point for generating constraints imposed by implementations on their environment. Furthermore, counter-example analysis for multi-threaded/multi-process implementations will follow similar techniques as described in [4] .
MDD Extensions
To cope with potentially enormous state spaces during model checking of a hybrid model, a symbolic hybrid model checker will be developed. This tool will utilize several well-established techniques for model checking with MDDs, and some recent advancements in state space generation (e.g., [11, 34] ). However, for a hybrid model, it is anticipated that additional information will be necessary for states or for edges in the finite state machine, in particular constraints that arise during counter-example analysis. This information will need to be incorporated into the MDD structure.
We are currently developing an extended MDD library to handle both traditional MDD structures and extensions of the form necessary to incorporate additional data such as constraints. Such data are typically stored either as values on MDD edges (e.g., [12] ) or in additional terminal nodes (e.g., [25] ). We are investigating the theoretical implications of each of these. A general-purpose MDD library that includes both edgevalue and terminal-value capabilities will ultimately be exploited by our symbolic hybrid model checker, and will be necessary to evaluate large-scale systems.
Ongoing Work
Identification of interaction points.
Given a high-level design model, identification of its interaction points can be done simply by examining the component models and their connections. For the running example Petri net in Figure 1(b) , interaction points are determined by looking at arcs that cross the dashed rectangles. Interaction points for low-level implementation correspond to accessing of shared variables and semaphores, and may also include execution boundaries (e.g., the entry and exit points of a function), depending on the component interactions.
Model to implementation interface. Before an implementation can be plugged into its corresponding component in the high-level model, it is necessary to convert the implementation interaction points into the corresponding model interaction points. In particular, points in implementation that access semaphores or other shared variables must be synchronized with the corresponding features in the high-level model. Interactive computer tools will be designed to generate a mapping between the interaction points of a component's implementation the corresponding interaction points of the high-level component model.
Translation of properties.
Once we have obtained the mapping between the high-level component's interaction points and the corresponding points in implementation, the next step is to identify the properties of design and incorporate them in the corresponding implementations. In other words, given a property (expressed in temporal logic) in terms of the high-level model, the property must be translated to include the implementation details using the mapping. For example in Figure 1(b) , a desired property at designlevel demands that write operations are performed in mutually-exclusive fashion and reading is allowed only when there are no writers.
The corresponding assertional requirements imposed on the implementations are preset at Lines 10 and 11 in Figure 1 (a) and Line 6 in Figure 1(b) .
Model Checking a Hybrid Model
As suggested earlier, the integration of a high-level design model and a low-level implementation model produces a hybrid model. This is done in a similar fashion to sequential program verification; however, shared variables require special handling while slicing counter-examples, since a shared variable accessed by the implementation may have been modified by another component. As such, the obtained FSS is conditioned on the actions of the other components in a constraint graph. A constraint graph for the writer process implementation is shown in Figure 2 ; the assertion in Line 11 is violated if the double-circled state in the graph is reached. From the graph, it is clear that this can occur only if (a) shared variable readers is initially negative, or (b) semaphore lock is not initialized correctly, or (c) shared variable readers is modified without obtaining the lock.
Dealing with state explosion As discussed in Section 4.2, we will utilize symbolic methods to handle large systems. Constraint data can be incorporated Figure 1(a) into an MDD using edge-value data and/or terminal node data, allowing for symbolic representations of large constraint graphs.
Counter-example analysis
The primary objective of verification is to identify potential flaws in the design or implementation and use appropriate corrective measures. As such, analysis of counter-example forms an integral part of model checking. As alluded in Section 4.1, counter-examples in sequential programs are analyzed using slicing so that the user can zoom in specific program segments that are likely to be cause of the error. Similar techniques will be developed in the setting of multi-threaded programs where additional information regarding the interleaving of threads will be extracted to provide error-cause information.
In the domain of high-level design models, need for smart counter-example analyzers are also required to reduce the burden of understanding and decoding counter-example cause. We are exploring techniques for analysis of counter-examples for CTL formulas based on identifying strongly connected components present in the system design.
