Abstract. This paper makes verifying continuous-time Markov chains (CTMCs) against deterministic timed automata (DTA) objectives practical. We show that verifying 1-clock DTA can be done by analyzing subgraphs of the product of CTMC C and the region graph of DTA A. This improves upon earlier results and allows to only use standard analysis algorithms. Our graph decomposition approach naturally enables bisimulation minimization as well as parallelization. Experiments with various examples confirm that these optimizations lead to significant speed-ups. We also report on experiments with multiple-clock DTA objectives. The objectives and the size of the problem instances that can be checked with our prototypical tool go (far) beyond what could be checked so far.
Introduction
For more than a decade, the verification of continuous-time Markov chains, CTMCs for short, has received considerable attention, cf. the recent survey [7] . Due to unremitting improvements on algorithms, (symbolic) data structures and abstraction techniques, CTMC model checking has emerged into a valuable analysis technique which -supported by powerful software tools-has been adopted by various researchers for systems biology, queueing networks, and dependability.
The focus of CTMC model checking has primarily been on checking stochastic versions of the branching temporal logic CTL, such as CSL [6] . The verification of LTL objectives reduces to applying well-known algorithms [20] to embedded discrete-time Markov chains (DTMCs). Linear objectives equipped with timing constraints, have just recently been considered. This paper treats linear real-time specifications that are given as deterministic timed automata (DTA). These include, e.g., properties of the form: what is the probability to reach a given target state within the deadline, while avoiding "forbidden" states and not staying too long in any of the "dangerous" states on the way. Such properties can neither be expressed in CSL nor in dialects thereof [5, 14] . Model checking DTA objectives amounts to determining the probability of the set of paths of CTMC C that are accepted by the DTA A, i.e., Prob (C |= A). We recently showed in [12] that this equals the reachability probability in a finite piecewise deterministic Markov process (PDP) that is obtained by a region construction on the product C ⊗ A. This paper reports on how to make this approach practical, i.e., how to efficiently realize CTMC model checking against DTA objectives.
As a first step, we show that rather than taking the region graph of the product C ⊗ A, which is a somewhat ad-hoc mixture of CTMCs and DTA, we can apply a standard region construction on DTA A prior to building the product. This enables applying a standard region construction for timed automata. The product of this region graph with CTMC C yields the PDP to be analyzed. Subsequently, we exploit that for 1-clock DTA, the resulting PDP can be decomposed into subgraphs-each of which is a CTMC [12] . In this case, Prob (C |= A) is the solution of a system of linear equations whose coefficients are transient probability distributions of the (slightly amended) subgraph CTMCs. We adapt the algorithm for lumping [13, 19] on CTMCs to our setting and prove that this preserves reachability probabilities, i.e., keeps Prob (C |= A) invariant. As the graph decomposition naturally enables parallelization, our tool implementation also supports the distribution of computing transient probabilities over multiple multi-core computers. Finally, multi-clock DTA objectives -for which the graph decomposition does not apply-are supported by a discretization of the product PDP.
Three case studies from different application fields are used to show the feasibility of this approach. The first case study has been taken from [3] which considers 1-clock DTA as time constraints of until modalities. Although using a quite different approach, our verification results coincide with [3] . The running time of our implementation (without lumping and parallelization) is about three orders of magnitude faster than [3] . Other considered case studies are a randomly moving robot, and a real case study from systems biology [15] . Bisimulation quotienting (i.e., lumping) yields state space reduction of up to one order of magnitude, whereas parallelizing transient analysis yields speedups of up to a factor 13 on 20 cores, depending on the number of subgraphs in the decomposition.
The discretization approach for multi-clock DTA may give rise to large models: checking the robot example (up to 5,000 states) against a two-clock DTA yields a 40-million state DTMC for which simple reachability probabilities are to be determined.
Related work. The logic asCSL [5] extends CSL by (time-bounded) regular expressions over actions and state formulas as path formulas. CSL TA [14] allows 1-clock DTA as time constraints of until modalities; this subsumes acCSL. The joint behavior of C and DTA A is interpreted as a Markov renewal process. A prototypical implementation of this approach has recently been reported in [3] . Our algorithmic approach for 1-clock DTA is different, yields the same results, and is -as shown in this paper-significantly faster. Moreover, lumping is not easily possible (if at all) in [3] .
In addition, it naturally supports bisimulation minimization and parallelization. Bisimulation quotienting in CSL model checking has been addressed in [18] . The works [4, 9] provide a quantitative interpretation to timed automata where delays of unbounded clocks are governed by exponential distributions. Brazdil et al. present an algorithmic approaches towards checking continuous-time stochastic games (supporting more general distributions) against DTA specifications [11] . To our knowledge, tool implementations of [4, 9, 11] do not exist.
Organization of the paper. Section 2 provides the preliminaries and summarizes the main results of [12] . Section 3 describes our bisimulation quotienting algorithm. Section 4 reports on the experiments for 1-clock DTA objectives. Section 5 describes the discretization approach and gives experimental results for multi-clock DTA. Finally, Section 6 concludes. (Proofs are omitted and can be found in the full version available at http://moves.rwth-aachen.de/i2/han.) Here, Distr(S) is the set of probability distributions on set S. The probability to exit state s in t time units is given by t 0 E(s)·e −E(s)τ dτ . The probability to take the transi-
Preliminaries
C be the set of paths in C and ρ[n] := s n be the n-th state of ρ. The definitions of a Borel space on paths through CTMCs,and the probability measure Pr follow [20, 6] . 
Definition 2 (Bisimulation
Let s 1 ∼ s 2 if there exists a strong bisimulation R on C with s 1 Rs 2 .
The quotient CTMC under the coarsest bisimulation ∼ can be obtained by partitionrefinement with time complexity O(m log n) [13] . A simplified version with the same complexity is recently proposed in [19] .
Deterministic timed automata
Clock variables and valuations. Let X = {x 1 , . . ., x n } be a set of clocks in R 0 . An X -valuation (valuation for short) is a function η : X → R 0 ; 0 is the valuation that assigns 0 to all clocks. A clock constraint g on X has the form x c, or g ∧ g , where x ∈ X , ∈ {<, , >, } and c ∈ N. Diagonal clock constraints like x−y c do not extend the expressiveness [8] , and are not considered. Let CC(X ) denote the set of clock constraints over X . An X -valuation η satisfies constraint g, denoted as η |= g if η(x) c for g of the form x c, or η |= g ∧ η |= g for g of the form g ∧ g . The reset of η w.r.t. X ⊆ X , denoted η[X := 0], is the valuation η with ∀x ∈ X. η (x):= 0
Definition 3 (DTA). A deterministic timed automaton (DTA) is a tuple
where Σ is a finite alphabet; X is a finite set of clocks; Q is a nonempty finite set of locations; q 0 ∈ Q is the initial location; Q F ⊆ Q is a set of accepting locations; and
We refer to q a,g,X −−−−→ q as an edge, where a ∈ Σ is the input symbol, the guard g is a clock constraint on the clocks of A, X ⊆ X is a set of clocks and q, q are locations. The intuition is that the DTA A can move from location q to q on reading the input symbol a if the guard g holds, while resetting the clocks in X on entering q . By convention, we assume q ∈ Q F to be a sink, cf. Fig. 1 
(b).
Paths. A finite (timed) path in A is of the form θ = q 0 a0,t0
, where η j is the clock valuation on entering q j and g j the guard of
, which is accepted by A. The set of CTMC paths accepted by a DTA is measurable [12] ; our measure of interest is given by:
Prob(C |= A) = Pr{ ρ ∈ Paths C |ρ is accepted by DTA A }.
Regions. Regions are sets of valuations, often represented as constraints. Let Re(X ) be the set of regions over
Definition 4 (Region graph [2]). Let
Product. Our aim is to determine Prob(C |= A). This can be accomplished by computing reachability probabilities in the region graph of C ⊗ A where ⊗ denotes a product construction between CTMC C and DTA A [12] . To ease the implementation, we consider the product C ⊗ G(A).
Definition 5 (Product). The product of CTMC
w .
is the exit rate function where:
The (reachable fragment of the) product of CTMC C in Fig. 1(a) and the region graph of DTA A in Fig. 1(b) is given in Fig. 2 
. It turns out that C ⊗G(A) is identical to G(C ⊗A)
, the region graph of C ⊗ A as defined in [12] .
Theorem 1. For any CTMC C and any DTA A, C ⊗ G(A) = G(C ⊗ A).
As a corollary [12] , it follows that Prob(C |= A) equals the reachability probability of the accepting states in C ⊗ G(A). In addition, C ⊗ G(A) is a PDP.
Decomposition for 1-clock DTA
Let A be a 1-clock DTA and {c 0 , . . . , c m } ⊆ N with 0 = c 0 < c 1 < · · · < c m the con-
Fig. 2. Example C ⊗ G(A)
stants appearing in its clock constraints. Let Fig. 2 constitutes such subgraph. Subgraph G i thus captures the joint behavior of CTMC C and DTA A in the interval [c i , c i+1 ). All transitions within G i are probabilistic; delay-transitions, i.e., δ-labeled transitions, yield a state in G i+1 , whereas a clock reset (of the only clock) yields a state in G 0 . In fact subgraph G i is a CTMC. To take the effect of "reset" transitions into account, define the CTMC C i with state space V i ∪ V 0 , with all edges from V i to V 0 , all edges between V i -vertices, but no outgoing edges from V 0 -vertices. 
Definition 6 (Augmented CTMC). Let
Here I n denotes the identity matrix of cardinality n. For details, consult [12] . 
Lumping
It is known that bisimulation minimization is quite beneficial for CSL model checking as experimentally showed in [18] . Besides yielding a state space reduction, it may yield substantial speed-ups. The decomposition of the product C ⊗ G(A) into the CTMCs C i naturally facilitates a bisimulation reduction of each C i prior to computing the transient probabilities in C i . In order to do so, an amendment of the standard lumping algorithm [13, 19] is needed as the CTMCs to be lumped are connected by delay and reset transitions. Initial states in CTMC C i might be the target states of edges whose source states are in a different CTMC, C j , say, with i = j. The partitioning of the target states in C i will affect the partitioning of the source states in C j . For delay edges, i=j+1 while for reset transition, i=0. The intra-CTMC edges thus cause a "cyclic affection" between partitions among all sub-CTMCs. From the state labeling (cf. Def. 6), it follows that for any two states v, v ∈ C i , if their respective successor states in C i+1 (or C 0 ) can be lumped, then v and v might be lumped. This implies that any refinement on the lumping blocks in C i+1 might affect the blocks in C i . Similarly, refining C 0 might affect any C i , viz., the CTMCs that have a reset edge to C 0 .
We initiate the lumping algorithm (cf. Alg. 2) for CTMC
This initial partition is successively refined on each C i by the standard approach [13, 19] , see lines 5-6. We then use the blocks in C i+1 to update AP i in C i and use blocks in C 0 to update AP i in all the affected C i 's, cf. lines 7-11. As a result, the new AP i may be coarser than the old AP i : 
Since v 1 and v 2 are labeled differently, they cannot be in one equivalence class. However, if in C i+1 it turns out that v 1 and v 2 are in one equivalence class, then we can update AP i to AP i such that now L(v 1 ) = L(v 2 ) = δ v 1 . In this case, v 1 and v 2 can be lumped together.
Algorithm 2 LumpGroupCTMCs
Require: a set of CTMCs Ci with APi for 0 i m Ensure: a set of lumped CTMCs C i such that Ci ∼ C if i = 0 then 10:
for j = 1 to m do updateResetEdge(AP0, APj ) ; {update APj }
11:
else updateDelayEdge(APi, APi−1) ; {update APi−1 according to APi} 12: return the new set of CTMCs lumped by the newest APi If some states have been updated in C i and AP i has been updated (line 7), there are two cases: if i=0, then we update all AP j , j = 0 that have a reset edge to C 0 (line 9-10); otherwise, we update its directly predecessor AP i−1 which has a delay edge to C i (line 11). This procedure is repeated until all AP i 's are stable.
Theorem 3. The transient probability distribution in C i and its quotient C i , obtained by Alg. 2 are equal.
As a corollary, it follows that the reachability probabilities of the accepting states in C ⊗ G(A), and its quotient obtained by applying Alg. 2 coincide.
Experimental Results
Implementation. We implemented our approach in approximately 4,000 lines of OCaml code. Transient probabilities are computed using uniformization, linear equation systems are solved using the Gauss-Seidel algorithm, and lumping has been realized by adapting [13] with the correction explained in [19] . Unreachable states (both forwards from the initial and backwards from the final states) are removed in C ⊗ G(A) prior to the analysis, and transient probabilities in C i are only determined for its initial states, i.e., its entry points. The tool adopts the input format of the MRMC model checker [17] . Thanks to the output facility of PRISM [1], the verification of PRISM models is possible.
Case studies. We conducted extensive experiments with three case studies. The first case study has been taken from [3] , and facilitates a comparison with the approach of [14] . The second case study, a random robot, is (to our taste) a nice example showing the need for DTA objectives. We use it for 1-clock as well as multi-clock objectives. The specifications of this example cannot be expressed using any other currently available techniques. The final case study originates from systems biology and is a more realistic case study. We first present experimental results using a sequential algorithm, with and without lumping. Section 4.4 presents the results when parallelizing the transient analysis (but not the lumping). All results (one and four cores) have been obtained on a 2 × 2.33GHz Intel Dual-Core computer with 32GB main memory. The experiments on 20 cores have been obtained using a cluster of five such computers with a GigaBit connection. All the results are obtained with precision 10 −8 .
Cyclic polling server
This case study facilitates a comparison with [3] . The cyclic polling server (CPS) system [16] is a queuing system consisting of a server and N stations each equipped with a queue of capacity 1, cf. Fig. 3 for N = 3. Jobs arrive with rate λ and the server polls the stations in a round-robin order with rate γ. When the server is polling a station with a full queue, it can either serve the job in the queue or it can poll the next station (both with rate γ). Once the server decides to serve a job, it can successfully process the job with rate µ or it will fail with rate ρ. The 1-clock DTA objective (adapted from [3] , see Fig. 4 . DTA for the polling server system (T = 1) Fig. 4 ) requires that after consulting all queues for one round, the server should serve each queue one after the other within T time units. The label st i indicates that the system is at station i; srv means that the system is serving the job in the current station and j arr means a new job arrives at some station. The DTA starts from station 1 at q 0 and goes to q 1 when polling the next station (st 2 ). It stays at q 1 for not polling station 1 -implicitly it goes sequentially from station 2 to N -until it sees station 1 again (and goes to state q 2 ). Note that the clock is reset before going to state q 2 . From state q 2 to state q 5 , it specifies serving stations 1, . . . , N one by one within the deadline T . The dashed line indicates the intermediate transitions from station st 2 to station st N −1 . Table 1 summarizes our results, where %transient and %lumping indicate the fraction of time to compute transient probabilities and to lump all CTMCs, respectively. The computed probabilities Prob(C |= A) are identical to [3] (that contains results up to N =7); our verification times are, however, three orders of magnitude faster. If lumping is not applied, then most of the time is spent on the transient analysis. 
Robot navigation
A robot moves on a grid with N × N cells, see Table 2 presents the results. Lumping is attractive as it reduces the state space by a factor two, and speeds-up the verification. As opposed to the polling system case, most time is spent on building the product and solving the linear equation system. The gray rows in Table 1 and 2 refer to similar product size whereas the verification times differ by two orders of magnitude (14.88 vs. 2433.31 ). This is due to the fact that there are two and three subgraphs, respectively. The resulting linear equation system has 2 and 3 variables accordingly and this influences the verification times. The number of blocks is not monotonically increasing, as the robot grid (how the walls and regions C and D are located) is randomly generated. The structure of the grid, e.g., whether it is symmetric or not, has a major influence on the lumping time and quotient size. 
Systems biology
The last case study stems from a real example [15] in systems biology. The goal is to generate activated messengers. M ligands can bind with a number of receptors, cf. Fig. 7 ). The LRM can decompose into three separate components in their initial forms with rate k −1 , or the messenger can separate from LRM into an inactive (resp. active) messenger with rate k −x (resp. k cat ). The 1-clock DTA objective is given in Fig. 8 . Intuitively, it requires a transformation from R to LR B N directly without jumping back to R inbetween and manage to activate a messenger within T time units after reaching the LR B N . In the DTA, fwd means that the last transformation is moving forward, i.e., not jumping back to R; at BN means that the process reaches LR B N and actMsg means the active messenger is generated. In q 0 , the process is on the way to reach LR B N . When it reaches LR B N , the DTA goes from q 0 to q 1 and resets the clock x. In q 1 , there is no active messenger generated yet. It will go to q 2 when an active messenger is generated. Note that the time constraint x < T is checked on the self-loop on q 1 . As Table 3 indicates, lumping works very well on this example: it reduces both time and space by almost one order of magnitude.
Parallelization
Model checking a CTMC against a 1-clock DTA can be parallelized in a natural manner. In this section, we present the results when parallelization is applied on the above three case studies. We experimented on distributing the tasks on 1 machine with 4 cores as well as 5 machines with 4 cores each (20 cores in total). We focused on parallelizing the transient analysis; as lumping is not parallelized, we determine the speedup without lumping. For each CTMC C i , we need to compute for each state its transient probability vector which corresponds to a column in the transient probability matrix. We distribute the computation of different columns to different cores (which might be on different machines). To do so, we launch N different processes and send them the rate matrix and a list of initial states, and each process returns the transient probability vector for each initial state. The speedup factor is computed by time without para.
time with para. . From Table 4 , it follows that parallelization mostly works well for larger models. For small models, it usually does not pay off to distribute the computation tasks, due to overhead. For the polling server system, as the number of stations N increases, the value of speedup for 20 cores speedup for 4 cores approximates 20/4=5. The same applies to the biology example. The parallelization does not work well on the robot example. The performance on 20 cores might even be worse than on 4 cores. This is due to the fact that only the transient analysis is parallelized. From Table 1 and 2, we can see that most of the computation time is spent on transient analysis for the polling and biology examples. This explains why parallelization works well here. In the robot example, however, the transient analysis does not dominate the computation time. This yields moderate speedups. An interesting future work is to apply parallel lumping as in [10] to our setting. Table 6 . Polling example with 1-clock DTA using discretization
Conclusion
We have presented a practical approach to verifying CTMCs against DTA objectives. First, we showed that a standard region construction on DTA suffices. For 1-clock DTA, we showed that the graph decomposition approach of [12] offers rather straightforward possibilities for optimizations, viz. bisimulation minimization and parallelization. Several experiments substantiate this claim. The main result of this paper is that it shows that 1-clock DTA objectives can be handled by completely standard means: region construction of timed automata, transient analysis of CTMCs, graph analysis, and solving linear equation systems. Our approach clearly outperforms alternative techniques for CSL TA [3, 14] , and allows for the verification of objectives that cannot be treated with other CTMC model checkers. Our prototype is available at http://moves.rwthaachen.de/CoDeMoC. Finally, we remark that although we only considered finite acceptance conditions in this paper, our approach can easily be extended to DTA with Rabin acceptance conditions.
