Aging phenomena in VLSI are enhanced by the shrinkage in transistor dimension with consequent increase in operating temperature and current density, and are now recognized as a major cause in the reduction of the chip lifetime and reliability.
Introduction
Aging phenomena in semiconductor materials and devices have been recognized and studied since the early stages of the Integrated Circuit manufacturing, as documented, for instance, by the IEEE Annual International Reliability Physics Symposium (IRPS) [20] whose first venue was in 1962.
In the past decades, the aggressive advances in chip manufacturing process has led to shrink the dimension of transistors to tens of nanometers worsening several aging mechanisms that are now a major challenge in VLSI useful life [25, 36] . At the same time, the advances in technology scaling has enabled the integration of several processing elements within the same chip, with a considerable increase in the current densities, and in turn, in the operating temperatures. These effects have caused detrimental interaction effects among adjacent elements in the chip with considerable drawback in terms of devices' lifetime and reliability. As reported in the International Technology Roadmap for Semiconductors (ITRS) in 2011 [22] , the temperature increase is the primary cause of an acceleration in device aging and wear-out phenomena, including electromigration (EM), time dependent dielectric breakdown (TDDB), negative bias temperature instability (NBTI) and thermal cycling (TC). Multi-Core Systems-on-Chips (MCSoCs) are heavily utilized in modern digital equipment at any level of complexity, and the analysis of MCSoCs reliability has been the object of several studies in recent years [40, 18, 43, 4] . The dramatic decrease of the lifetime of digital computing systems based on MCSoCs has pushed the focus on the performance and reliability of such devices. The studies have shown that the system-level design choices (e.g., mapping and scheduling, workload distribution, utilization levels) have a macroscopic impact on the chip lifetime higher than circuit-level reliability enhancing techniques [43, 35] .
In the area of permanent computer storage devices, the nonvolatile storage technology based on NAND gates coupled with manufacturing scaling, has made it possible to increase the storage density and to reduce the storage cost per bit allowing the flash-memory Solid State Drives (SSDs) to compete with traditional magnetic HD technology. However, flash cells degrade at any Program/Erase (PE) cycle, until the cell is no longer readable with detrimental effects on the memory reliability and endurance [11, 34, 33, 38] . To mitigate the effect of cell degradation on the SSD lifetime various wear-leveling techniques have been implemented to balance the number of erasures on the memory blocks [12, 17] so extending the life span of the drive, although introducing mutual dependences among the memory blocks [42, 37] . This paper defines a fairly general analytical model for the quantitative evaluation of performance-and reliability-related measures in large VLSI systems affected by aging phenomena, and we apply it to two study cases. The proposed analytical model is based on the formalism of Markovian Agent Model (MAM) [5, 6, 2] . A MAM is a collection of Markovian Agents (MAs), where an MA is a finite-state stochastic model governed by an infinitesimal generator kernel that contains local transition rates and transition rates induced by the interaction with the other MAs. Furthermore, MAs are located in a defined geographical space, so that their local properties and the interaction induced mechanisms may depend on their relative positions. The overall MAM model may combine MAs representing the VLSI elements (core or memory blocks) and their mutual interaction and MAs representing the aging mechanism and its effect on the VLSI elements. Each MA can capture physical characteristics of the VLSI element and how these characteristics are influenced by the interaction with the other elements and with the aging mechanisms. Further, the MAM can account for system-level design choices that have a considerable impact on the system endurance. A preliminary study of a MAM for a MCSoC has been presented in [1] showing how the reliability of such a kind of system is influenced by the variations of temperature and workload in caused by the redistribution of applications when a core failure occurs and by the interaction among the cores.
In this paper, we illustrate the capabilities of the MAM model on two case studies with two different flavors: a Multi-Core System-on-Chip and an SSD, the former being it a typical study case addressed with different strategies, the latter as a proof-of-concept for the proposed approach, being it a field where only little data is available. Section 2 introduces the formalism of MAMs and Section 3 shows how a general aging model based on MAMs can be built. Then two case studies follow, namely, the MCSoC case study in Section 4 and the SSD in Section 5. Section 7 gives an overview on related works and conclusions follow in Section 8.
Markovian Agent Model
A Markovian Agent Model (MAM) is an agent-based spatio-temporal analytical formalism suitable to describe large scale systems of interacting objects. Each of them is represented by a MA, a finite-state automata over which a transition kernel is defined. Agents are located in a space and both their local properties and mutual interactions may depend on their positions.
In a Markovian Agent Model model we are interested in computing the temporal evolution of the average numbers of agent in a given state for every location. Solution is obtained with a mean field approximation thanks to a set of ordinary differential equations. In this work, we extend conventional Markovian Agent Model by adding reward states: continuous variables that are not used to count the average number of agents in a given state, but to hold the values of a control parameter.
Conventional states evolves according to a transition kernel defined by two components: a local transition matrix and an induced transition matrix. The former describes the local behavior by an homogeneous Continuous Time Markov Chain (CTMC) whose values may depend on the position of the agent. The latter models the interactions of the MA with the other MAs. The evolution of the reward states is based on a drift function, which in turn depends on the average count of agents in a given state (either in the same or in neighbour locations). MAs may belong to different classes and agents in a specific class c are characterized by the same transition kernel; more details are described in [2] . In the extension used in this work, each agent class determines also the rewards and the corresponding drifts associated to the MAs. From a graphical point of view, we represent conventional states with circles and local transitions with arcs (as in standard Markov Chain models). Reward states are represented with double circles, and drifts with double arrows, following a notations similar to the one used in Fluid Stochastic Petri Nets to denote continuous places and transitions [13] . The graphical notation is summarised in Figure 1 , where two conventional states i and j, a reward state k, a transition with rate q ij and a drift at rate λ k are shown.
Let us call π c (t; v) state vector of a single MA of class c at time t. It contains one component for each state: Figure 1 : Graphical representation of a Markovian Agent with reward states π D c (t; v) holding all the components corresponding to conventional states, and π D c (t; v) focusing on reward states. Elements of π D c (t; v) corresponds to the average count of agents in that state in the considered position at the given time. If we focus on a location that contains a single agent, the components of π D c (t; v) can be seen as an approximation of the state probability distribution of the agent. For reward states, the corresponding component in π R c (t; v) holds the average accumulated reward in the considered location. We also denote with [Π V ] the ensemble of the state vectors of all the MAs in the space at time t: in other words, a single vector that includes all the states, rewards, classes and locations of the model.
The transition kernel is a square matrix with as many rows and columns as conventional states, and it rules the stochastic behavior of an MA of class c. Its formal description is:
where The presence of a separated interaction component allows to avoid the construction of the whole state space and to solve the overall model by building one sub-models for each MA, and then solve them separately using the standard equation:
with a given initial condition π c (0; v). Thus, the solution of the overall model at any time t is obtained by solving for any MA of any class c in the space an equation of type (2) . Several examples of the derivation of the structure of the kernel K c (t; v; [Π V ]) without rewards are described in [6, 2] .
Aging model in MAM
Here we will describe the general methodology used in MAM to consider the effect of aging on the stochastic behavior of the agents..
Given an agent of class c in position v we enrich the model with a reward state A that keeps track of the accumulated aging level Ψ. Such value increases at rate λ(v, Π V ) that may depend on the agent location v and on the ensemble Π V at time t in order to take in account the impact of other MAs to the aging of agent v. To simplify the notation, we assume that such dependency can be captured by a single parameter Ω = ω(Π V ). Moreover, we assume that the agent in position v has at least a transition tr f that depends on the aging process, typically tr f represents a failure event. Let us call Y the random variable representing the time to failure of tr f . Due to the dependency on the ensemble, the probability distribution of Y (t|Ω) is non-homogeneous and non-exponential.
To support such type of distribution we apply the supplementary variables approach from Cox [10] . The random variable Y is characterized by a probability density function (pdf) f Y (t) and Cumulative distribution function (Cdf) F Y (t), its hazard rate h Y (t) and accumulated hazard rate H Y (t) are defined, respectively, as:
Assuming that the process starts at time t = 0, we can model its non-exponential behavior by a continuous variable Markov process, where at each point in time t, the non-exponential event modeled by Y occurs at rate 1 λ Y = h Y (t). The accumulated age Ψ of the process is:
Since h Y (t) is a strictly positive function, H Y (t) is invertible. This allows us to express the failure rate as function of the accumulated age Ψ:
Moreover, the evolution of the accumulated age can be expressed in a differential way:
Whenever the value of Ω varies the Time To Failure (TTF) distribution of Y (t|Ω) changes, however it has been proved that the reliability conserves [26] , that is, if at time t , the TTF distribution changes from Y to Y , we must have that:
This is in general obtained by "time shifting" the hazard rate of the new distribution Y to match the previously accumulated aging. However, in our model we have:
since Ψ does not depend on t or t . In other words, the change of variable from t to Ψ that allows us to express the hazard rate as function of the aging also allows us to change directly the TTF distribution from Y to Y , conserving the reliability.
In particular for a Weibull distribution, defined by scale parameter α and shape parameter β, we have:
4. Case study 1: Reliability model of a Multi-Core System-on-Chip
System description
The first case study we propose in this paper shows the applicability of Markovian agents for the analysis of aging and wear-out effects in Multi-Core System-on-Chip architecture. The considered MCSoC architecture, shown in Figure 2 , is composed of a set of homogeneous processing cores organized with a mesh structure. The system is used to run a multi-programmed workload, composed of various applications that are distributed on the available processing cores by the operating system according to a specific mapping policy.
Based on the workload distribution, each core may either perform some elaboration or be in an idle status, and consequently will have a specific power consumption. Moreover, based on the power spatial profile of the overall architecture, the temperature of each core is a function of the its internal power consumption and the heat conveyed by neighboring cores. Finally, the time dependent rate of wear-out of each core caused by the progressive aging is primarily related to its temperature, with an exponential form.
As a conclusion, the aging process of each unit in a MCSoC architecture is strictly related to its activity and therefore on how the workload is distributed. The complexity of this picture is even more exacerbated by considering that the failure of a core may not represent a catastrophic failure of the overall system. Indeed, the workload can be redistributed on the remaining cores thus allowing a sort of graceful degradation of the architecture. Such a strategy can be adopted till the system is able to provide the minimum required Quality-of-Service, i.e., there is a minimum number of healthy cores capable of providing the necessary computational power to complete the running workload with the required performance level.
In this scenario, the construction of the analytical model is particularly challenging because the lifetime of any single core is influenced by the operating conditions of all the other cores with a sort of "multi-step" relationship involving core utilization, power consumption, temperature and heating interference, and aging rates, as well as the possibility of the core to fail.
We here show that MAMs are suited to model the reliability of MCSoC architectures by taking into account the complex cause-effect relationship affecting core aging over the time. We will demonstrate that the MA dependence on the position in the space allows the model to investigate the effect of various workload distribution strategies on the aging of the various cores and consequently on the lifetime of the overall architecture. 
Thermal and aging models
The temperature model for the various cores within the complex architecture is a quite complex formulation based on differential equations involving various aspects, such as the geometrical organization of the architecture, the power characteristics of the considered technological node and the actual activity profile of each core. For this reason, we have used a set of state-of-the-art tools for characterizing a simple linear regression model capable of modeling the steady-state temperature for each core in a given activity profile of the overall architecture. The details of such characterization can be found in [1] .
The obtained model takes in input W, a binary matrix whose element w i,j in position (i, j) is 1 if the core in position (i, j) is in the W state, and 0 otherwise. The model returns the temperature T c (i, j) of a core in position (i, j) as:
where c 0 . . . c 3 are four thermal constants. In this work, we have considered c 0 = 329 K, c 1 = 16 K, c 2 = 3 K, and c 3 = 1.5 K.
As recognised by both academia and industry, aging mechanisms for integrated circuits modeled with the Weibull distribution, are validated in several sources, as documented by the JEDEC standard [23] . Discussion on academic works based on such results will be given in Section 7. In our case study, for a proof of concept, the electromigration aging mechanism has been considered and characterized as in [4] . The shape parameter β = 2, and the scale parameter α EM (T ) is computed with the Black's equation:
where A EM is a material-dependent constant, J and J crit are the current density and its critical value activating the phenomenon, respectively, n is empirically determined constant, E a EM is the activation energy, K is Boltzman's constant, Γ is the Gamma function, and T is the constant worst-case temperature in Kelvin degrees. In particular, as in [4] , E a EM = 0.48eV, K = 8.61673324 · 10 −5 eV/K and n = 1.1. Finally, since other parameters are usual not disclosed by the industry, they have set according the common academic practice [18, 43, 15, 4] that consists in fitting the lifetime of a single core to a given value. In our work we have considered an Mean Time To Failure (MTTF) equal to 10 years for a single core working at the constant steady-state temperature of 60 • C; thus, J = 1.5 · 10 6 A/cm 2 , J crit ∼ = 0A/cm 2 , A EM = 3 · 10 5 (h/cm 2 )·(A/cm 2 ) n .
System model
To evaluate the reliability of the overall system, we represent each core with the MA shown in Figure 3 . It has three conventional states, namely active (W), idle (I) and failed (F), used to model the activity of the core , and a reward state Aging (A) , used to identify the aging level of the core. The core may alternate between the W and the I modes before reaching the absorbing state F . The transition rates between W and I depend on the location of the (2) to (5)). Then, the TTF either from state W or I to F , is a function of the aging level of the core that primarily depends on core temperature and utilization level. We assume that the total workload to be processed by the system is known and constant in time. Actually, this type of workload is generally considered in embedded systems, where the same group of applications has to be repeated continuously regardless of the inputs. Moreover, such assumption might be easily removed to consider workload changes in time by adding a modulation process; we leave further studies in this direction for future work. The workload distribution policy aims to always elaborate such total workload, without exceeding a given power budget, by rotating the activation of the cores in order to stress all of them, on average, at the same level, thus with the same aging effects. We identify the average stress level of the core by means of a utilization parameter 0 ≤ U ≤ 1. When a core fails, its workload is uniformly re-distributed among the remaining cores, until the cores reach an utilization U = 1. Subsequent core failures will cause the MCSoC to operate in a degraded condition.
The alternation between the W and I states in each core is tuned to the rotation cycle-time at which the different cores are activated in the chip. Since the transition rates δ(v, Π V ) and γ(v, Π V ) that determine the alternation cycle between W and I are, in any case, several orders of magnitude faster than the failure rates µ W (v, Π V ) and µ I (v, Π V ) from W and I to F , respectively, we can assume that the states W and I reach their steady state value at the time scale of the core failure [3] . Hence:
As the time proceeds, the average number of still alive cores n al (t) at time t can be calculated from:
where n 0 is the initial number of cores in the MCSoC , v i is the position of the core of index i and π F (t; v i ) is the failure probability of the core in position v i at time t. To preserve the global workload of the chip, the utilization of the cores increases in time according to the following law:
where U 0 is the initial utilization of the system with n 0 cores. We assume a fixed mean active-idle cycle length L, with:
Equation (16), combined with Equations (15) and (14), allows us to evaluate how the transition rates from W to I change in time:
In order to characterize the aging rates in a variable working scenario, it is necessary to consider the fact that the temperature uses to change in time and, unfortunately, the basic formulation in Equation 13 receives as input a single, constant temperature value. Therefore, as show in [1] , λ(A) formula has been modeled to support such temperature change. As a first step, if we consider a single core switching between two different temperatures T I and T W with the same probability, asymptotically λ(A) can be modeled as:
After that, we had also to consider the fact that, based on the thermal model, the temperature of each core depends on all its neighbors, thus sensibly increasing the complexity of Equation (18) since we have to consider the probability of each neighbor to be in each single status. Therefore, we simplified the thermal model to compute the average temperature of each core by considering the average utilization of each neighbor (which in turn asymptotically represent the probability for each core to be in a specific state). therefore, let us define sum of the utilization of the side (U S (i, j)) and corner (U C (i, j)) neighbors as:
We approximate the T W (i, j) and T I (i, j) temperatures in the W and I states as:
Basically, the age accumulates following Equation (18) where the temperature considers the effect of the local W and I states, plus the average contribution of the neighbors. However, the failure rates in the two W and I states, consider only the failure time distribution specific for the corresponding operational temperature.
Results
Let us consider a system with n 0 cores, each of them with an utilization of U 0 , as introduced in Section 4.3. When a core fails, the utilization of the remaining ones is increased by the workload distribution policy to maintain a fully operational state thus preserving the total system workload of n 0 U 0 .
However, when all cores reach the full utilization, further failures lead the system to a degraded operating state. To analyze such behavior, let us define the workload W l(t) at time instant t as:
where the sum of the utilisations of the cores corresponds to the current processing level of the CPU. Then, the mean time to degradation M T T D of the system, i.e. the first time instant at which the global workload goes below n 0 U 0 is computed as:
Such performance index allows us to evaluate the effectiveness of the workload distribution policy in extending the fully operational phase of the MCSoC .
To evaluate the proposed MAM model, in the following paragraphs we present two experimental sessions using two different workload distribution policies. All the experiments were performed on a personal computer with an Intel Core i5-2450M CPU at 2.50GHz and 6 GB RAM with completion times of a few minutes. Scenario 1. In the first scenario we assume a uniform distribution of the workload over a system with 16 cores and an initial per-core utilization of U 0 = 0.95%, for a total workload Θ = 16 · 0.95 15. In such a case due to the high utilization level of each core, we expect a short fully operational phase.
An equal amount of workload can be elaborated by systems with a larger number of cores reducing their stress level. Indeed, we can recompute the percore utilization, required by an architecture with n cores to elaborate the workload Θ, as U n 0 = Θ/n. For instance, the per-core utilization required for a system with 36 cores will be reduced to U 36 0 = 0.416%. Figure 4 shows in detail the behavior of one core of such system. After an initial short transient phase, the core reaches its fully operational condition working at the target utilization U 36 0 . As the time goes on, the failure probability increases thus, to preserve the desired total workload, the distribution policy must increase the utilization of the cores until the value of 1 is reached. After such time, further failures lead the system to work in a degraded condition. Figure 5 shows the behavior of the workload normalized with respect to Θ for the 16, 36 and 64 architectures. As expected, we can observe how performing the constant workload of the 16 cores chip over architectures with a greater number of cores increases the system M T T D. Indeed, the 16 core chip on average lasts in full operating condition for about one year, whereas the 36 CPU lasts slightly more than six years and the 64 chip more than ten years. Scenario 2. In such scenario the cores are partitioned into two sub-classes: initially the primary cores are active, whereas the spare cores are inactive. We consider a 12×12 CPU with 144 cores and we study two different workload redistribution policies:
• P 1 : the workload lost due to the failures of the primary cores is evenly redistributed to the spare ones only, increasing their utilization level.
• P 2 : the workload lost is evenly redistributed to the spare ones until the workloads of the primary and spare cores are balanced. Then, the lost workload will be redistributed evenly on all the remaining cores.
Moreover we consider different spatial distributions of the initially active/spare cores, according to the patterns shown in Figure 6 . Figure 7 shows the results for two different target utilization levels (of the initially active cores) U 0 = 0.6 and U 0 = 0.9 for the considered policies P 1 and P 2 . The two policies obtain very similar results, even if P 2 performs slightly better. The pattern distribution has instead a more sensible impact: best results are obtained when the initially working cores are concentrated in a corner of the CPU. It is interesting to note that its dual -when the cores initially active are distributed over an L-shaped pattern, has the worst performances.
To better show the evolution of the system, Figure 8 presents the thermal distribution of the temperature over the chip at three different time instants for policy P 1 , target U = 0.9% and initial core distribution B3 of Figure 6 . At the beginning of the life of the system, the active cores, which are concentrated in the middle band, are working at a high temperature, while the outer bands are colder. When the MTTD is reached at around five years (Figure 8b) , the CPU works at a lower temperature, and the pattern is inverted: indeed the spare cores at the beginning now have to take the place of the failed ones. After ten years, due to the distribution of the failures over the CPU, the differences between the temperature of the hot and cold area are reduced, and the surviving cores work at a much lower average temperature.
Case study 2: Reliability model of an SSD

System description
The second example is given to support the validity of the approach based on MAM to cope with aging phenomena in VLSI. The diffusion of SSDs is rapidly increasing, even if the major critical point remains the endurance and reliability due to wear-out. Although the model in not formally validated, and results are not compared with measurement, all the parameters and effects have been defined to match known behaviour and physical characteristics described in the available literature. SSDs are the current alternative to traditional hard disk drives (HDD) where rotating platters are replaced by NAND based memories. They are non-volatile storage devices, generally composed by a controller and one or more solid state memories [33] . SSDs use the same interfaces and form factors of traditional HDD, so that they can easily fit in commodity infrastructures. Data is stored into array of NAND cells that are organised into pages and blocks. Pages are the smallest unit that can be read or written in a single command. They are usually from 4KB to 32KB of size, and are grouped into blocks, generally consisting of N P×B = 64 to N P×B = 512 pages each, for a total of 256KB to 16MB. For technical reasons, only empty pages can be written, which must be erased before being updated. However, erasing can only occur at the block level (i.e. group of 64 ∼ 512 pages), destroying all of its content. A consequence of the particular read/write/erase structure of SSD is that pages can be in three states: Empty, they do not contain data; Used, they store useful data; Dirty, they contain invalid data.
As introduced, only empty pages can be written, and when the OS deletes a file, its block is marked as dirty with a special command called TRIM. A page cannot be overwritten without erasing the entire block first. Only dirty (and empty) pages can be erased to free space, but this must be done at the block level: as a consequence, all the pages in the block must be either dirty or empty. If this is not the case, the controller must either move, or read into memory and then write again the pages in use when erasing a block. The writes to move valid data lead to a problem called Write Amplification Factor (WAF): when no more empty pages are available, the controller must read and write a larger number of pages to perform a single write operation. This effect has serious impact on the SSD performance and endurance. In order to limit the WAF, SSD controllers usually performs some sort of Garbage Collection (GC) [7, 17, 32] : they harvest for dirty pages, consolidate the ones in use, and erase blocks to create new empty pages.
When focusing about reliability [33] , SSDs have a smaller error rate com-pared to traditional HDD, since they do not involve mechanical parts. However they suffer from severe aging problems. In particular, each block can support a limited number of program/erase (PE) cycles, after which it becomes unusable. For these reasons, SSD are characterised by a maximum amount of data that can be written on a disk before all its pages reaches the maximum number of PE cycles. Not all pages are erased and rewritten with the same rate, so it is quite common that some pages wear out faster than other ones. To mitigate this effect, SSDs usually apply a wear levelling algorithm to balance the utilisation of all the pages. 
System model
The SSD has N B blocks of N P pages each. We model the time behaviour of a single block with a MA composed by 6 conventional states that model the the evolution of pages inside a single block. A reward state models the aging of the pages in the block. The complete MA is reported in Figure 9 . In this case, to simplify the notation, dependencies of the rates have not been shown in the picture.
The conventional states of the MA have the following meaning. State E denotes empty pages, state U x is split in 3 states (U App , U Dat and U T mp ) that denote pages in use, state D denotes dirty pages and state F failed pages.
The number of pages in block v (v = 1, . . . , N B ), in state X ( X= E, U App , U Dat , U T mp ,D,F) at time t and the total number of pages in state X at time t in the SSD are denoted by, respectively π X (t, v), with X∈Ω π X (t, v) = N P , being Ω the state space of the conventional components of the MA, and π π π X (t) = v=1,N B π X (t, v).
Initially, each block starts with all the pages in the empty state (π E (t, v) = N P ). A writing request moves a page from the empty state E to one of the inuse states U x . Since writing requests may have very different characteristics, we model a workload composed by three components to make the model flexible enough and adaptable to different realistic situations. The first component models the access to pages containing the operating system code and the main applications that are seldom written and seldom modified, and are represented by state U App ; the second component models pages that are written and deleted at almost constant rate (i.e., temporary files) during the lifetime of the SSD and are represented by state U T mp ; the third component models pages that are basically written only after the initial setup has been completed, and represents the main data a user saves on her disk. Such data is deleted only when available disk space starts becoming an issue and is represented by state U Dat . Pages of type x (with x ∈ {App, Dat, T mp}) arrive at the SSD at a rate λ x (t) and are assigned to a block v according to an assignment function l x (v; [Π V ]), so that :
Writing rates λ x (t; v; [Π V ]) depend on the type of page, on the elapsing of time (to reflect that the writing rate may change during the life of the memory), and on the entire state of the MAM (to support aging and wear-leveling algorithms). Pages are deleted at rates φ x (t). Deletion simply marks pages as dirty and is represented in the BMA by the transitions from U x to D. A deleted page becomes invalid and cannot be overwritten. Mixing the relative consistency of the U x states with the write and delete rates, we can model a very wide variety of workload conditions. Dirty pages cannot be overwritten, but can be erased since contain invalid data. When a block is full just of empty or dirty pages, it can be erased. Erasing can also occur when there are no longer empty pages on the disk but still dirty pages. In this case, the SSD dumps the used pages from the block into the main memory, erases the entire block, and finally rewrites the old and the new data. Erasing events are modeled by the transition that connects states D to state E, occurring at rate ψ(t; v; [Π V ]). Garbage collection erases dirty pages in a block to increase available free space [7] .
Blocks age at each PE cycle, and can be erased only for a limited number of times: reward state A counts the number of erasures in a block. State A is characterized by a drift ψ(t; v; [Π V ]) that matches the transition from state D to state E. In this way, π A (t; v) measures the cumulative number of erased pages in block v up to time t. A page may fail from any state X (X / ∈ F) to F at a rate γ(t; v; [Π V ]) that depends on the accumulated age of block v.
In particular, in this scenario γ(t; v; [Π V ]) has been chosen to match the hazard rate of a Weibull distribution according to Equation (10), where the time variable has been replaced by the value of reward state A, π A (t; v):
In Equation (23), β and α are the shape and scale parameters, respectively.
Results
We derive a set of numerical experiments on an illustrative example of an SSD composed by N B = 48 blocks, each one containing N P = 64 pages. The aim of the numerical example on a small-scale case is to show the capabilities and potentialities of the model in a wide range of application cases; scalability of the model will be discussed in Section 6.
Page occupation is determined by the arrival rates λ x (t) and by the assignment function l x (v; [Π V ]) that assigns incoming requests to blocks and pages. To gain flexibility in the input conditions and make them time dependent, the rates λ App (t), λ Dat (t) and φ Dat (t) are determined by the following function ρ(t, y 0 , y 1 , t 0 , σ):
where y 0 = lim t→−∞ ρ(t, y 0 , y 1 , t 0 , σ), y 1 = lim t→+∞ ρ(t, y 0 , y 1 , t 0 , σ), t 0 is the time at which the function reaches the mean value ρ(t, y 0 , y 1 , t 0 , σ) = (y 0 +y 1 )/2 and σ is the slope. In this way, we can modulate the time interval in which the rates are active, their absolute value and the variation of the absolute value in time.
In particular, we assign the following values to the parameters in Equation (24) . λ App (t) = ρ(t, 40, 0.001, 10, 2), λ Dat (t) = ρ(t, 1, 25, 15, 5) and φ Dat = ρ(t, 0.0156, 0.1875, 400, 10) (see Figure 10 ). The other arrival and deletion rates are considered to be constant and set to: λ T mp = 8 req./day, φ T mp = 1/8 req./day and φ App = 1.5610 −5 req./day. With a shape parameter β = 40 and scale parameter α = 800 in the Weibull distribution of Equation (23), the failure rate becomes an increasing function of the age. Blocks are assigned in order of their increasing address with incoming requests occupying the first available block, that is:
To simplify the solution of the model, we suppose that garbage collection is triggered every ∆ GC , and operates when the number of dirty pages in a block exceeds a threshold χ. All blocks with π D (t; v) > χ are erased generating valid space but, at the same time, increasing the age of the block. The GC algorithm is formulated in Equation (27) .
The procedure defined in Equation (26) assumes that the speed at which pages are erased is performed in negligible time with respect to the time of the other actions included in the model, and mathematically corresponds to setting function ψ to a Dirac's delta whenever the conditions are verified and zero otherwise. The numerical solution procedure implements the proposed technique by solving the differential equation (Equation (2)) for time steps of size ∆ GC , and then applying Equation (26) Figure 11 shows the evolution of the total number of pages in each state all over the disk (i.e. π π π X = v π X (t − ; v) with X ∈ {E, D, F, U App , U Dat , U T mp }). Initially, all pages are empty and then the three considered types of block start filling the disk. The first time condition π D (t; v) > χ of Equation (27) is satisfied occurs at t = 80 days: the selected time interval ∆ GC makes the depicted line having a saw-tooth shape in the empty and dirty pages. The number of failed pages remains negligible until around t = 800 days, when the wear-out effect becomes dominant, and pages rapidly start failing, making the disk not usable.
The distribution of the number of pages per block in the different states with respect to time is shown in Figure 12 . Several insights can be highlighted. Figures 12a) and 12b) show the number of pages per block in states E and D, respectively. The considered assignment policy leaves some of blocks almost unused (i.e., those with index v > 40), and exploits them only when the disk is close to is failure (Figure 12a ). Application pages occupy the initial blocks of the SSD (Figure 12c ) (i.e., with index v < 10), while the other types of pages occupy the remaining usable space of the disk in an almost uniform way (Figure 12d-e ). Failures then occur prevalently in the area where user data and temporary files are stored (Figure 12f) .
The system aging is shown in Figure 13 by reporting for each block the cumulative number erased pages in state A, π A (t; v). As expected, blocks with a low number of PE cycles like those that are almost not used (v > 40) or those that contain mostly application data (v < 10) have a low level of wearing. Data blocks or temporary blocks present an almost linear aging with time. The increasing failure rate of the considered Weibull distribution makes them fail when they approach the considered scale parameter α.
Wear leveling
Wear leveling algorithms transfer pages from one block to another to obtain a more uniform distribution of the age among blocks, thus increasing its endurance.
In this example, we introduce a simple wear leveling procedure inspired by the age convergence mechanism proposed in [30] . Every ∆ W L , the system swaps the K W L most aged blocks with the K W L least aged ones. In more details, the wear leveling procedure searches the subset V M ∈ V of K W L blocks for which the age π A (t; v) is maximum, and the subset V m ∈ V of K W L blocks for which the age π A (t; v ) is minimum, then swaps any block v ∈ V M with a block v ∈ V m . Formally, the algorithms can be described as follows:
if t = n · ∆ W L , n ∈ N :
This procedure assumes that block swapping takes a negligible amount of time with respect to the other time scales of the model. As a further simplifying assumption, we consider that wear leveling has not a relevant impact on the aging of the blocks being swapped. Figure 14 shows the evolution of the total number of pages in a given state when wear leveling is applied. As it can be seen, the model behaves exactly as the one shown in Figure 11 , yet the disk starts deteriorating at t = 1000 days, showing a 25% improvement in the expected lifetime of the SSD. Figure 15 shows the distribution of aging among the blocks. Wear leveling has spread aging more uniformly over the disk leading to a better utilization of the blocks. In this case, when failures occur, they are evenly distributed over the disk, as shown in Figure 16 .
Complexity of the algorithms
Multi-Core System-on-Chip. The state space of the case study in Sec. 4 has the following size. Each of the n 0 = 144 cores is represented by the MA in Figure 3 composed by a total of N S = 4 states. The state space of the whole system is obtained by the Cartesian product as:
which is a number that is hardly explored even by simulation. Our analytical technique is feasible because we solve separately 144 equations of the form (2), thus reducing an exponential complexity to a linear one. However, to achieve such result we need to compute for each MA in position v the induced kernel I(t; v; Π V ) that accounts for the influence of other agents on the behavior of the MA in position v. When all MAs influence each others according to their current states, the computation of the induced kernel requires O(n 0 2 N S 2 ) operations. Such complexity is greatly reduced when the interactions of each MA are limited to a restricted region around its position and to specific states of the agents.
In the simplified thermal model of the MCSoC example, the temperatures of the cores in the W and I states are influenced only by the utilizations of the side and corner neighbors, as computed in Eqs. (19) and (20) . In this way the induced kernel computation complexity is O(n 0 n I ), where n I = 8 is the maximum number of neighbors of an MA. Equation interval is discretized in k 0 = T h /∆t time points, where ∆t is the size of the integration step. The complexity of the algorithm is thus O(k 0 (n 0 N I + n 0 N S )) because for each time point and each MA is required to compute the influence of its N I neighbors and to solve an equation of form (2) with a state vector π(t; v) of size N S . We can observe that limiting and confining the interactions between MAs is the principal way to reduce the complexity of the proposed technique.
Solid State Drive. In the example of Sec. 5, the 0 . . . T h time horizon of the analysis is uniformly discretized with steps of size ∆ GC yielding k 1 = T h /∆ GC time points. For each point and for each block v in isolation: i) the evolution of the block during an interval of size ∆ GC is computed, then ii) if the resulting number of dirty pages in the block is greater than the threshold χ, the pages are updated according to Equation (26) . Phase i) requires the computation of equation (2) with a state probability vector π c (t; v) of size N S = 7. The equation is solved over a time interval of size ∆ GC discretized in k 2 = ∆ GC /∆t points. The complexity of phase i) for a single block is O(k 2 N S ), the complexity of phase ii) is O(1) for a total of O(N B N S k 1 k 2 )=O(N B N S T h /∆t) operations for the analysis of the whole system. In presence of the wear leveling algorithm k 3 = T h /∆ W L block swapping operations are further required, which consist in finding the K W L youngest and the K W L oldest blocks and exchanging their pages as in Equation (27). Searching such blocks can be computed using an heap data structure with O(K W L logN B ) operations, whereas the page exchanges take O(N S ), thus a total of O(k 3 (K W L logN B + N S )) operations are additionally required. Even with the addition of the wear leveling operations, the resulting complexity is linear with respect to the number of blocks and the number of states, moreover increasing the number of pages per block has no impact on the complexity with a clear benefit on the scalability of the approach. However, care is needed during the solution of the differential equations due to the arising of stiff K c (t; v; [Π V ]) matrices. In such a case, with basic resolution techniques a very small ∆t is required to keep the stability of the analysis, thus largely increasing the solution time. To overcome this problem, advanced multi-step and adaptive-stepsize techniques, such as Runge-Kutta methods, are required to achieve a satisfying trade-off between accuracy and efficiency of the solution.
Related works
Aging phenomena in integrated circuits have been widely investigated in the literature; we will here discuss system level models and approaches for MCSoCs . The industrial standard procedure for the estimation of the lifetime of a class of devices has been proposed in [23] and is based on post-production accelerated aging experiments. Such models generally consider the chip as a single unit, thus considered damaged after the first failure, and assume a fixed worst-case operating temperature. These two aspects represent a considerable limitation in the early estimation of the lifetime of a modern computing system. Our proposal allows considering the temporal evolution of the system, providing a more detailed analysis than the worst case and models subsequent core failures. The first dynamic reliability management strategy acting at system-level has been proposed in [39] . The work combines the aging models defined in [23] with the sum-of-failure-rates (SOFR) approach to take into account several aging mechanisms. However, it considers a single-core device, and, adopts an exponential failure distribution, that is not very realistic. Various subsequent works [9, 24] suffer from the same limitations. Our work instead aims at considering a large number of cores to tackle possible future scenario where the parallelism in a many-core system will grow dramatically.
Weibull or lognormal distributions have been later considered in [18, 43] to track aging history. For computability reasons, the numerical solution of such model is simplified: a subset of representative workload traces is extracted for a reduced period of time with a fine granularity; from them, the average failure rate [18, 43] is extrapolated to be considered during the MTTF computation. Here we presented a novel way to include aging, factorized when possible to avoid the explosion of the computational complexity of the model, but preserved separately for the single cores.
When considering the occurrence of multiple subsequent core failures, MCSoC are actually K-out-of-N:F systems. Such model is based on multiple integrals [28] , whose dimension is given by the number of failures the system can tolerate. In practice, its analytic solution is unfeasible. Thus, many approaches [18, 24] consider the entire system not to survive beyond the first failure, while some other one [19, 43, 15, 4] adopt Monte Carlo simulation or the multi-armed bandits approach [29] to quickly estimate the multiple integrals required to study the time to failure in an analytic way. Such technique will become computationally expensive when considering a temporal redistribution of the workload to maintain a target service level: the MAM approach instead provides reasonably accurate solutions in reduced computation times.
A major concern in the widespread diffusion of NAND flash memory technology as permanent storage devices is about performance, reliability, PE endurance and data retention time [14] . Various design techniques have been investigated to achieve an acceptable bit error rate to enhance reliability and endurance and to tradeoff between reliability and performance. Error correcting codes, are systematically used at the controller level, to achieve an acceptable raw-bit error rate. Publications proposing ECC based solutions for flash memories are in [8, 44] . Page is the smallest storage unit and write/read operations are performed at page level. However, erase operations are performed at block level. GC is a process required to create erased blocks [41] at the cost of incurring in a write amplification. A GC algorithm which reduces the WAF by randomly selecting a number of blocks and acting on the block with the fewest number of valid pages, is described in [16, 17] . Results from analytical models are restricted to uniform random writes. RAID architectures are essential for SSD [21, 27] to increase the reliability and endurance of these storage devices. However, traditional RAID approach may have a negative effect and specific RAID techniques must be studied in conjunction with garbage collection [31, 32] .
Analytical and simulation results are obtained from synthetic workloads, but now consistent field studies from large samples of devices in real operating conditions are available as documented in [38] .
Conclusions
In this work, we have presented a MAM for studying the reliability of a MCSoC CPU. With the proposed technique, we are able to study CPUs composed by a large number of cores, and consider complex non-homogeneous nonexponential time to failure distributions with respect to their expected lifetime because of aging phenomena. Future work will focus on different application scenarios, also trying to remove the simplifying assumption used both in the thermal and in the aging models.
