Abstract. Model checking can tell us whether a system is correct; probabilistic model checking can also tell us whether a system is timely and reliable. Moreover, probabilistic model checking allows one to verify properties that may not be true with probability one, but may still hold with an acceptable probability. The challenge in developing a probabilistic model checker able to handle realistic systems is the construction of the state space and the necessity to solve huge systems of linear equations. To address this problem, we have developed ProbVerus, a tool for the formal verification of probabilistic real-time systems. ProbVerus is an implementation of probabilistic computation tree logic (PCTL) model checking using symbolic techniques. We present ProbVerus, demonstrate its use with a simple manufacturing example, and report the current status of the tool. With ProbVerus, we have been able to analyze, within minutes, the safety logic of a railway interlocking controller with 10 27 states.
Introduction
The large size and high complexity of real-world mission-critical systems makes the verification of these systems an extremely difficult problem. To study the complete system, one must also include the system's interface with the environment. Physical systems are stochastic in nature and randomization makes the verification of probabilistic systems even more difficult due to its nonintuitive effects. At the same time, industries such as the transportation, pharmaceutical, chemical, and nuclear, are required to meet this challenge being constantly under the scrutiny of process and product specification. There is a great need for industrial-strength formal methods as well as real-world case studies that can demonstrate their feasibility.
Probabilistic model checking is a method for the formal verification of stochastic systems. The state of the art in probabilistic verification research includes numerous theoretical studies that have lead to efficient algorithms; for example, there is an LTL model checking algorithm which is exponential in the size of the formula and polynomial in the size of the Markov chain [9] . The bottleneck in developing a probabilistic model checker able to handle realistic systems is the construction of the state space and the necessity to solve huge systems of linear equations. This paper proposes a novel approach; it presents an implementation of probabilistic model checking using multi terminal binary decision diagrams (MTBDDs) to perform the probability calculations. MTBDDs, introduced in [8] , differ from binary decision diagrams (BDDs) in that the leaves may have values other than 0 and 1; in this case the leaves contain transition probabilities. Hachtel et al. have used algebraic decision diagrams ADDs (same as MTBDDs) in the Markovian steady state analysis of large finite state machines (with up to 10 27 states) [10] . MTBDDs can be integrated with a symbolic model checker and have the potential to outperform other matrix representations because they are very compact, by eliminating redundancy and allowing computations on sets of states rather than on individual states. While it is difficult to provide precise time complexity estimates for probabilistic model checking using MTBDDs, the success of BDDs in practice made the MTBDD representation worthwhile to explore.
We have developed ProbVerus, a probabilistic symbolic model checker, which combines Probabilistic Computation Tree Logic (PCTL) model checking [11] and symbolic techniques. PCTL, which allows the expression of time and probability, has been selected for its expressive power and the simplicity of the verification algorithms it involves. Sections 2 and 3 introduce the building blocks of ProbVerus, PCTL and MTBDDs. Section 4 describes ProbVerus with a short run down of the syntax and the semantics of ProbVerus programs.
In section 5 we demonstrate the use of ProbVerus in the verification of engineering systems by modeling and analyzing the reliability of a simple manufacturing system. By extending model checking to the analysis of probabilistic systems, we have been able to model the stochastic behavior of manufacturing systems: arrival time of successive raw workpieces, processing time on a machine, machine setup time, material handling time, message transmission time, lifetime of a tool, time to failure of a machine or a robot, repair time, and so on. Probabilistic model checking allows us to verify properties of these systems, such as "the probability of the system reaching a deadlock within the first 12 hours of operation is 2%". Such information is extremely useful in the design of such capital-intensive systems since deadlocks and unscheduled downtime of equipment due to failures are major factors on system performance. Section 6 reports the current status of ProbVerus and concludes with a discussion of the feasibility of the approach in realistic applications.
Probabilistic Computation Tree Logic
We use Probabilistic real time Computation Tree Logic (PCTL) introduced in [11] . PCTL augments Clarke, Emerson, and Sistla's CTL [7] with time and probability. The temporal operators (F, G, U) are extended with time bounds and probabilities (F , G , U ). The expressive power of the resulting logic is illustrated by the following examples:
• reqU ack: there is at least a probability p that there is an acknowledgment to the request within t units and req will be true until ack becomes true.
there is no system failure Sysfail for t time units with a probability of at least p.
• F alarm: there is a probability of at least p that an alarm will be generated within t time units. PCTL formulas are interpreted over finite state discrete-time Markov chains. This model has been used in the analysis of complex probabilistic systems, such as dependability analysis of fault-tolerant real-time control systems and performance analysis of commercial computer systems and networks. The Markov model has been the standard model used for probabilistic model checking (Hart and Sharir [13] , Lehman and Shelah [16] [9] , Aziz el al [2] ).
Markov models are constructed from states and state transitions; a state describes the system at one time instant; a state transition describes the change in state at one tick. In discrete-time models, all state transitions occur at fixed intervals and are assigned probabilities. The basic underlying assumption is that the probability of a next state transition depends only on the current state. For reliability analyses, the Markov model fits with the standard assumption of constant failure rates, exponentially distributed interarrival times for failures, and Poisson arrivals of failures. Reliability models usually assume that repair of a failed system restores it so that the failure rate of the repaired system is the same as if no failure had occurred. This assumption is valid during the useful life cycle of a component, but not accurate for components that improve with time (burn-in period) or components subject to wear and aging (wear-out period). This assumption is made nevertheless to allow for analytic solutions. (π) , the k+1th state of path π is denoted by π(k). A prefix of π of length k is defined by s 0 → s 1 → ... → s k . We define the probability measure Prob on the probability space , where Path ω (s) is the set of infinite paths π with first(π)=s, and Σ(s) is the smallest σ-algebra on Path ω (s) that contains the paths { : τ is a prefix of π} where τ ranges over all finite execution sequences starting in s. The probability measure Prob on Path ω (s) is the unique measure with
Model of
: τ is a prefix of π}= Syntax and Semantics of PCTL. PCTL formulas are state formulas, i.e they represent properties of states. Given a probabilistic process P described by a labelled Markov chain M =(S, s i , T, L), PCTL formulas are defined inductively as follows:
• Each atomic proposition is a state formula.
• If f 1 and f 2 are state formulas, then so are and ( ).
• If f 1 and f 2 are state formulas and t is a nonnegative integer, then f 1 U ≤ t f 2 (strong until) and f 1 U ≤ t f 2 (weak until) are path formulas.
• If f is a path formula and p is a real number with , then and are state formulas. For a given state s, formulas and express that the probability of paths starting in s fulfilling the path formula f is at least p and greater than p, respectively. We discuss only the bounded operators.
The operator U is the strong until operator and U is the weak until operator. Intu-
means that with a probability of at least p both f 2 will become true within t units and f 1 will hold until f 2 becomes true.
means that there is at least a probability p that either f 1 will remain true for t time units, or that both f 2 will become true within t time units and that f 1 will hold until f 2 becomes true.
The truth of PCTL formulas for a labelled Markov chain M =(S, s i , T, L) is defined by the satisfaction relation s | = M f which intuitively means that the state formula is true at state s in M. To define the satisfaction relation for states we also use the relation σ| ≡ M f which intuitively means that the path σ satisfies a path formula f in M. The relations are defined inductively as follows: 
if and only if the probability measure of the set of paths σ from s for which
By definition,
i.e., process P satisfies a PCTL formula Φ if and only if its initial state s i satisfies Φ. We use the shorthand notation:
Formulas f 1 U f 2 and f 1 U f 2 have analogous meanings.
Model Checking in PCTL .
Next we describe the model checking algorithm, which labels the states of a labelled Markov chain M =(S, s i , T, L) with the PCTL formula
We introduce the function p(s, t) for and t a non-negative integer, defined as the measure of the set of paths π starting in s which satisfy f 1 U f 2 :
The probability function p(s, t) satisfies the following recurrence equation. For :
The proof of the above recurrence equation can be found in [11] .
This recurrence equation leads to the following algorithm, which computes p(s,t) and labels states with f 1 U f 2 if p(s, t) is greater or equal to the given probability p.
1.
Build the transition probability matrix P Create a vector v indexed by the states such that the i-th element of v is set to 1 if and 0 otherwise.
3.
Compute P t , the t-th power of the transition probability matrix. 
Multi-terminal Binary Decision Diagrams
BDDs are a canonical representation of boolean functions f: proposed by Bryant [3] . They are often more compact than traditional normal forms, such as conjunctive normal form and disjunctive normal form, and can be manipulated efficiently. For these reasons, BDDs have found wide use in CAD applications, including symbolic simulation, verification of combinational logic, and verification of sequential circuits. In 1993 Clarke et. al [8] showed that BDDs can be generalized to represent functions f:
where D is any set of values. Such diagrams are called multi terminal binary decision diagrams (MTBDDs). MTBDDs can be used to represent Dvalued matrices efficiently. F(x 1 , y 1 ,. .., x n , y n ) = P ((x 1 ,. .., x n )(y 1 ,..., y n )) which allows us to represent the transition probability matrix by an MTBDD over (x 1 , y 1 ,..., x n , y n ).
MTBDD Representation of Labelled
The following example illustrates the MTBDD representation of the transition probability matrix: consider a single machine which may be in one of the following states: setup, processing, down. Figure 1 shows the state transition diagram, which captures the transitions between the possible states. After the setup operation, the machine starts processing, it makes a transition to the processing state with probability 1. In the processing state, there are two possibilities, the machine finishes processing
and goes to the setup state, or the machine fails. We label the transition setup → processing with p and the transition processing → down with f, such that p+f =1. After the machine is repaired, it returns to the setup state with probability 1.
For simplicity, we use two atomic propositions, a 1 and a 2 , to label the states:
We fix the order of a 1 and a 2 and encode the states as follows, e(setup) = 00, e(processing) = 01, e(down) = 10. Figures 2  and 3 show the function F, and the corresponding MTBDD respectively. 
ProbVerus
ProbVerus is an implementation of PCTL model checking using symbolic techniques. It is an extension of Verus [6] , a verification tool which combines powerful symbolic model checking techniques and efficient quantitative algorithms for computing minimum and maximum time delays between two events. Verus has been already used to verify several real-time systems: an aircraft controller, the PCI local bus, and a robotics controller [4, 5] . By extending Verus with PCTL model checking and MTBDDs we have created a single verification environment in which we have access to correctness checking, performance analysis, and reliability analysis of a single model of the system.
ProbVerus features a language which is designed especially to simplify writing probabilistic real-time programs. It is based on the core Verus language, an imperative 
ProbVerus only has boolean variables. The compiler introduces an auxiliary variable, the wait counter wc, which indicates the wait statement reached by the program in the current state. The integer wc is encoded as the respective sequence of booleans to suit the format. The length of the bitstring needed to represent the wait counter is fixed and the values of wc in a specific program are determined at compile time of that program.
Semantics of ProbVerus Programs. ProbVerus programs describe Markov chains, the behavior of which is specified by the initial state and the transition probability matrix.
A statements atomically, thus collapsing contiguous statements to one transition, leads to models with fewer states, which allows the verification of much larger systems. The transition relation of the program is obtained by taking the disjunction of all relations between wait statements. A path in the transition graph is defined as a sequence of states such that N( , ) is true for every . All computations are performed on states reachable from the initial state set. Our stochastic model does not consider nondeterminism; we define one initial state by assigning fixed values to all the state variables before the first wait statement. Verus restricts the set of accepted programs to those for which a wait statement is traversed at each loop iteration. The last statement of every accepted program ends with the statement while (true) wait(1); which guarantees that the final state is observed.
The semantics of ProbVerus programs is computed statement by statement using the following function R:
where M is the set of matrices m:
[0,1], ST is the state space, N is the set of naturals, B is the set of booleans.
A Manufacturing Example
In this section we illustrate the use of ProbVerus with a simple example. We model the stochastic behavior and check stochastic properties of a small automated manufacturing system comprised of two numerically controlled machines M1 and M2 that are identical in all respects. Each machine is modeled as a Discrete Time Markov Chain (DTMC) that has three states: (0) the machine is being setup, (1) the machine is processing, and (2) the machine is undergoing repair following a breakdown. Figure 4 shows the state transition diagram, which captures the transitions among these three states. Each arc is labeled with the corresponding one-step transition probability; each transition corresponds to advancing the discrete time one time unit.
After the setup operation the machine starts processing, hence after state 0 the next state is with certainty state 1 (p 01 =1). If the current state is state 1, the next state is state 0 or state 2 with probabilities p 10 = Prob{processing finishes before failure} p 12 = Prob{failure occurs before finishing of processing} After undergoing repair the machine resumes the processing operation, hence if the current state is state 2, then the next state is always state 1 (p 21 =1).
We can describe the behavior of the above machine in ProbVerus as follows. Figure 6 displays the DTMC model of the AMS comprised of the two machines M1 and M2. In state s0 the two machines are being setup. In state s1 one of the two machines is processing and in state s2 both machines are processing. While a machine is processing, there are two possibilities: either the machine finishes and returns to the setup state or the machine fails. In s3 one of the machines fails which then moves for repair in state s4. The system is down when both machines are under repair (state s5).
Using ProbVerus, we have modeled the AMS of Figure 6 and checked the PCTL formula F AMS_is_down, which states that AMS will fail, i.e. that it will reach a state labeled with AMS_is_down (machine M1 is down and machine M2 is down), within t units after the setup of the system with non-zero probability p. We have used the following one-step transition probabilities: 
Current Status
We have been testing the modeling range of ProbVerus with success in a series of application domains: manufacturing, transportation, and fault-tolerant industrial process control systems. The example, that we have described, demonstrates that ProbVerus provides the language to describe the stochastic behavior of manufacturing systems in a straightforward manner. PCTL allows the expression of the stochastic properties, which is critical information when designing automated manufacturing systems and analyzing their performance, availability, and reliability. Moreover, probabilistic model checking allows us to combine the performance (quantitative timing analysis) and reliability (quantitative probability analysis) early in the design phase of manufacturing systems. This is significant because real-world automated manufacturing systems must meet competitive levels of productivity and pay-back ratio where high costs are involved.
An important question is how our techniques scale up to large systems. Our largest case study is based on ACC ("Apparato Centrale a Calcolatore") [14] , a complex industrial safety-critical system developed by Ansaldo Transporti for the control of medium and large-size railway stations. A set of qualitative (e.g. safety, liveness) and quantitative (e.g. response times and probabilities) properties have been automatically 
analyzed. Despite the complexity of the system (the model has about 10 27 states) specifications were checked in 329 seconds using a Sparc 10 with dual processors and 256Mb of memory. During the verification of the ACC design, we discovered a subtle and anomalous behavior leading to a deadlock of the system. The anomalous behavior was pinpointed by an automatically generated counterexample trace, showing precisely the behavior leading to the violating state. The same behavior blocked the entire operation during a field test of an earlier version of the system. When we modeled possible failures in the communication between the controller and the physical devices, i.e. when the control variables of the safety logic are not in agreement with the values of the corresponding physical level crossing variables, then safety specifications were violated. Therefore, a failure in the sensing operation may lead to a CLEAR_SIGNAL although the gate may be still moving, or the actual level crossing is not closed. We have computed the probability of reaching unsafe states within seconds. Our numerical analysis has been performed assuming a serial link between the controller and the physical devices for which reliability information is published. Vital railway controllers, such as the ACC system, implement appropriate error recovery mechanisms for the occurrence of vital/non-vital communication failures, which have not been modeled. Our probabilistic analysis has intended to illustrate the valuable information probabilistic model checking can provide to the engineers who design and validate safety-critical systems.
Conclusions and Future Work
This paper presents a new tool for the formal verification of stochastic systems. We have implemented PCTL model checking using multi terminal binary decision diagrams (MTBDDs) within the Verus verification tool. The PCTL logic allows the expression of time and probabilities which are needed for the verification of stochastic systems, such as fault-tolerant real-time control systems, networks, and manufacturing systems. BDDs and MTBDDs provide a compact representation of the model by representing sets of states rather than individual states. Moreover, symbolic techniques are amenable to efficient verification algorithms. By extending Verus, we have created an environment which allows the verification and quantitative analysis of a single model using CTL, RTCTL, and PCTL model checking. While model checking can tell us whether a system is correct, probabilistic model checking can also tell us whether the system is timely and reliable. Moreover, an advantage of probabilistic model checkers over non-probabilistic model checkers is that it allows one to verify properties that may not be true with probability equal to one, but may still hold with some acceptable probability. ProbVerus allows the safety analysis and verification to take place concurrently with the system design so that design errors are detected and corrected prior to the traditional test phase. The use of symbolic techniques (BDDs and MTBDDs) in contrast to an explicit-state implementation of PCTL model checking makes it possible to analyze larger systems making probabilistic symbolic model checking a feasible approach worth further development.
