In Section 4, we compare GSPNs and stochastic reward nets. In Section 5, we discuss in detail some practical issues in solving dependability and performability models: largeness, stiffness, and non-exponentiaiity. 
Approaches to Modeling

Dependability Modeling
Reliability block diagrams, fault trees, and reliability graphs are commonly used to study the dependability of systems [59] . Although these models are concise and have efficient solution methods, they cannot represent dependencies among components [56] as easily as CTMC models can [21, 23].
We begin by considering a fault-tolerant, multi-processor computer with multiple, shared memory modules. The system is able to detect" a processor or memory module failure and reconfigure itself to continue operation without the failed component. The system can operate with just one processor and one memory module.
Our first model of this system is the reliability block diagram in Figure 1 . We could attach to each component the probability of having failed by a particular time. In a more general parameterization, a failure time distribution function, rather than a probability value, can be attached to each component. For example, one can assign the exponential distribution Fp(t) = 1 -e -_,t to processors and Fro(t) = 1 -e -_'t to memories. We can request the system failure time distribution as a function of the time variable t. For a system with two processors and three memory modules,
We can also ask for the mean time to system failure,
,_p+ 2,_m 2)tp + 2,_m ,_p+ 3,Xrn 2,_p + 3)tin Now suppose we want to investigate a different computer design where the two processors have fast private memory modules and the system has slower, shared memory modules. We assume that the system operates as long as there is at least one operational processor with access to either a private or shared memory. We cannot model this system w!t h a block diagram, because there is no way to model how the shared memories are connected to all processors while private memories are connected to particular processors. So, we turn to a fault tree model, shown for two processors and three memory modules in Figure 2 . We could also use a reliability graph, where time-to-failure distributions are assigned to the edges. The system is operational as long as there is a path from source (src) to sink. In this particular model (Figure 3 ), processor failures The edges II and /2 do not represent system components; they represent the structure of the system (the sharing of M3). We assign the "infinite" distribution, defined by I(t) = O, to them. There is a path from source to sink if PI and MI are up or if PI and M3 are up, and similarly for paths involving P2. Analysis of the reliability graph results in the same failure time distribution as the fault tree analysis.
Now we extend our models to take into account repair or replacement of parts. We calculate the "availability" of the system, the (transient or steady-state) probability that the system is functioning. We examine the all-shared-memory system and look at three repair strategies:
I. There are enough repair resources to repair all components at the same time, if necessary. If we analyze the reliability block diagram of Figure 1 with the assignment of distribution functions of Equation 1 to the components, the resulting function is the system unavailability at time t, U,y,(t), and the "massat infinity" (1 -lim_--.oo U,_0 (t)) is the steady-state system availability.
To deal with the second and third repair strategies, we can no longer use the block diagram model. The block diagram assumes that all components are statistically independent, but, if components share repair facilities, the failure and repair behavior of onecomponent is dependent on the state of all components.
If the failure and repair distributions are exponential, we can use a CTMC model: Consider the CTMC in Figure 4 . State mp represents the system when m memory units and p processors are functional. The model with all of the solid and dashed-line transitions is for the second repair strategy (one repair facility for = processors and one for memories). The model for the third strategy (only one repair facility giving priority to the processors) is obtained by excluding the dashed lines, since no memory is repaired while there are failed processors.
We note that we could have used a cTMC for the first repair strategy as well. We would have assigned different transition rates to the repair transitions to reflect the fact that more than one component can be repaired at a time: As: an example, the rate for the transition from 02 to 12 would be 3 */_m rather than #,n. The block diagram model, though, is both easier to construct and more efficient to analyze. 
given the initial state probability vector P(0 
The reward rate at time t for the MRM is given by T(t) = re(t) The accumulated reward over the
// /0'
The expected reward rate at time t of the MRM is:
The expected reward rate in steady-state for the MRM is: 
The expected time-averaged reward rate over the interval [0, t) is given by _ riL_(t)/t. In an availability model with 0-1 reward assignment, the total uptime of the system over the The distribution of the reward rate at time t, T(t), is computed as: 
For instance, the distribution of time to complete a job that requires r units of processing time on a system which is modeled by an MRM can be computed in this fashion. We explore these topics in the following subsections.
Largeness
The problem of model largeness can be handled in two ways: it can be avoided or it can be tolerated.
Largeness Tolerance
For the sake of simplicity we assume that the underlying model is a CTMC or an MRM. (s',_').
If (S', A _) is a subgraph of (,S, A)
, the exact state-space exploration algorithm, or the model, is simply modified to ignore certain arcs which lead to states in S \ $'. In our example, we can prevent a k + 1-th failure in a state which already has k failed components. We call this case "strict truncation" ( Figure  12 ).
Alternatively, (S', A') might be composed by a subgraph of (8, A), augmented with one or more states and arcs. In our example, we might add a new state u (for unknown), and an arc from each state with k failed components to u, corresponding to further failures of the non-failed components. Strictly speaking, this is more an "aggregation", so we call this approach an "aggregation truncation" ( Figure  13 ).
The two approaches often allow us to obtain upper and lower hounds on the measure of interest. 
.K-_,_}
If we are interested in the expected instantaneous computational capacity in steady state, c, that is, the expected number of non-failed processors in the long run, the CTMC in Figure 12 still offers an upper bound, but the one in Figure 13 is of no use, since state u has probability one in steady state, which would simply result in the trivial lower bound 0 for c. In any case, our ability to obtain useful bounds is normally tied to our a priori knowledge of aspects of the CTMC structure and values of the reward rates. 
Each of the K terms in the second case is smaller than N, with the exception of the last one, which is N, so this approach is always guaranteed to reduce the size of the state space. The reduction is particularly sizable when N is small and K is large: for example, when N = 2 we have 2 K vs. K + 1. 
Stiffness Avoidance
According to this approach, stiffness is eliminated from a model by applying some approximation scheme.
This results in a set of non-stiff models which are then solved to obtain the overall solution. Bobbio and 
