Runtime reconfiguration is one promising way to mitigate for increased failure rate and thereby it fulfills safety requirements needed for future safety-critical avionics systems. In case of a hardware fault, the system is able, during runtime, to automatically detect such fault and redirect the functionality from the defective module to a new safe reconfigured module, thus minimizing the effects of hardware faults. This paper introduces a high level abstraction architecture for safety-critical systems with runtime reconfiguration using the triple modular redundancy and the synchronous model of computation. A modeling strategy to be used in the design phase supported by formal models of computation is also addressed in the paper. The triple modular redundancy technique is used for detecting faults where, in case of inconsistency in one of the three processors caused by a fault, a new processor is reconfigured based on a software or hardware reconfiguration, and it assumes the tasks of the faulty processor. The introduced strategy considers that no other fault occurs during the reconfiguration of a new processor.
Introduction
The safety and reliability of modern avionics may be threatened by trends that are largely driven by high-volume commercial applications, e.g. environmental concerns as restriction of hazardous substances (RoHS) directive that forced the removal of lead from commercial electronics and solders.
Another trend arises from technological innovation in commercial electronics. The effort to place more functionality and performance in smaller packages and lower power has led to ever-shrinking device geometries down to deep submicron dimensions with new physical failure mechanism that affect the wear out of semiconductor devices. Additionally, small geometries negatively affect the susceptibility of the semiconductor device to atmospheric radiation.
One of the next big challenges for the avionics industry is to address these trends that increase failure rate and thereby affect safety and reliability. Bieber et al. [1] points out runtime reconfiguration as one of the big challenges for the future generation of integrated modular avionic (IMA) systems. In the event of a hardware failure, the system is able to reallocate the functionalities from the faulted module into a safe module, thus limiting the effects of a hardware failure on aircrafts.
Perhaps the most important component of a runtime reconfigurable safety-critical system is the fault detection mechanism. One of such mechanisms is the triple modular redundancy (TMR), capable of detecting and mask possible faults in a system, improving reliability [2] . In such architecture, depicted in Figure 1 , three processes execute the same functionality, and a majority voting mechanism selects the output that most occurs. If one of the processors fails to produce the correct output, possibly due to a single event upset (SEU), the voting mechanism masks such fault with the output of the other two processes. Any number of processors can be used in modular redundancies, however the minimum number of redundant processors necessary to detect and mask a fault is three.
Systems with triple modular redundancy are tolerant to both transient faults, i.e. faults that appear for a very short period of time and then disappear, and single permanent fault, i.e. faults that remains active for a long or possibly indefinite DOI 10.3384/ecp19162016 Figure 1 : Triple modular redundancy architecture. V represents the voting mechanism and P 1 , P 2 and P 3 represent three processes with the same functionality. amount of time. However, faults in the voting mechanism lead to errors, making the voter a single point of failure. To improve reliability, three voters can be used instead of one. In that case if a fault occurs on the voters, the system can mask such fault, thus eliminating the single point of failure.
In view of this, this paper proposes a high level abstraction architecture for safety-critical systems with runtime reconfiguration (RTR) using the triple modular redundancy and the synchronous (SY) model of computation (MoC). Such architecture is composed of one fault detection mechanism, several runtime reconfigurable processes, and a control device to manage the reconfiguration process. Differently from the traditional triple modular redundancy architectures, the proposed architecture can mask multiple permanent faults, provided that no two faults occur in a small time interval defined by the reconfiguration time of a new module.
Models of Computation
Models of computation are a collection of rules dictating the semantics of execution and concurrency in computational systems. A common framework to classify and compare different MoCs is the tagged signal model [3] . In such framework, MoCs are a set of processes acting on signals, according to the following definitions.
Definition 1 (Signal). In the tagged signal model, a signal s ∈ S is a set of events e i = (t i , v i ) composed by a tag t i ∈ T and a value v i ∈ V . The set of signals S is a subset of T ×V .
Definition 2 (Process).
In the tagged signal model, a process P is a set of possible behaviors that defines relations between input signals s i ∈ S I and output signals s o ∈ S O . The set of output signals is given by the intersection between the set of input signals and the process S O = S I ∩ P. A functional process is a process described by a single value mapping f : S I → S O and describes either one behavior or no behavior at all. The tagged signal model classifies MoCs as being timed or untimed. In a timed MoC, all events in all signals can be ordered based on its tags, i.e. the set of tags T is totally ordered. In an untimed MoC, the set of tags T is partially ordered, i.e. events can only be locally ordered.
Synchronous (SY) MoC
The synchronous MoC belongs to the class of timed MoCs and it is based on the perfect synchrony hypothesis, which states that neither computation nor communication consumes time. As a consequence, every signal is synchronized, meaning that for any event in any signal, there is an event with the same tag in every other signal. This allows the representation of signals as a list of values in which the position of each value in the list represents its tag, i.e., s[k] = v with k ∈ T and v ∈ V . Another important property of the synchronous MoC is that the absence of an event is well defined. Such phenomenon is defined as an event, with some tag t ∈ T , whose value is the absent value ⊥ ∈ V , i.e. e = (t, ⊥).
Although the perfect synchrony hypothesis is not physically feasible, the synchronous MoC works well when modeling clocked-based systems, provided that both computation and communication are fast enough to fit within one evaluation cycle.
Modeling TMR with RTR
A triple redundancy architecture proposal using runtime reconfiguration is illustrated in Figure 2 . It works as follows: three runtime reconfigurable processors, RTRP 1..3 , are configured with the same functionality and, given the same input signal, should provide the same output signal. Knowing this property, the Voter compares the results and possibly the states outputted by each processor. If, by any chance, one of the processor's output differs from the other two, the Voter assumes that there must be a fault in such processor and, therefore, a new processor must take its place. In view of this, the Voter sends a signal to the Control Device informing which processor is malfunctioning, so that the Control Device can allocate a new RTRP x to assume the failed processor's task.
The newly allocated processor must then synchronize its states with the two remaining RTRP that are still executing R. Bonna Triple modular redundency by reconfiguration DOI
10.3384/ecp19162016
Proceedings of the 10th Aerospace Technology Congress October 8-9, 2019, Stockholm, Sweden in order to mask the fault. To do that, the first time a processor executes, it loads the current states from a current state shared memory (CSSM), which is represented as a delay using the SY MoC and can be physically implemented as a set of processor registers.
Similarly to N-modular redundancy (NMR) with spares, when one of the processors becomes unreliable, i.e. starts to produce inconsistent results, it is replaced by a spare processor. However, here the spare processors, represented by RTRP n , with n > 3, can be initially loaded with less critical applications that can be overloaded when needed, providing better usability of resources.
For such architecture to work properly it is assumed that neither the Voter nor the Control Device fails, thus both of these devices are single point of failure for this system. Although it is possible to eliminate the Voter's single point of failure by using three voters, there can only be one reconfiguration manager, represented by the Control Device.
When two RTRPs fail either at the same time, or in a time window smaller than the necessary time to reconfigure a new processor, we say the triple redundancy system fails. To show how likely such failure occurs, consider that all RTRP have the same failure rate and, for every cycle, the probability of failure of an RTRP j is p(F j ) = ρ. Consider also that it takes m clock cycles to reconfigure a new RTRP in case of a failure. Then, the probability p(F a |F b ) of some RTRP a to fail in a time window of m+1 cycles (including the cycle in which the fault was detected), provided that some RTRP b has already failed, is given by
Therefore, the probability of failure of our triple redundancy architecture with RTR is given by
We define the ratio of improvement R I as being the probability of failure of a single RTRP divided by the probability of failure of our triple redundancy architecture, given by (2) . The larger the ratio of improvement, the more fail-safe the triple redundancy with RTR is when compared to an architecture with a single processor. Such ratio of improvement is given by
The ratio of improvement R I shows the importance of the reconfiguration time m in the robustness of the triple redundancy architecture with RTR. In case of a fault in one of the processors, the system can still continue to perform correctly with two processors while another RTRP is being reconfigured, however it becomes vulnerable to a second fault in this time window. Therefore, the fastest the reconfiguration, the less vulnerable the system is. Traditional triple redundancy architectures (without RTR) are immune to a single processor permanent fault, however they are vulnerable to multiple faults.
Runtime Reconfigurable Process (RTRP)
We start to model the triple redundancy architecture by modeling what we are calling a runtime reconfigurable process, similar to the architecture presented in [4] . We consider that for such process to be in its most general form, it must have some internal memory to store its states, and it must take into account reconfiguration time. When a new RTRP is being reconfigured, it takes a number m ∈ N of clock cycles, proportional to the size of the functionality bitstreams, to perform reconfiguration before it is able to execute for the first time.
When the RTRP executes for the first time, it must synchronize its internal memory with the internal memory of the other two RTRP executing the same functionality. In order to achieve that, we added an extra input, a synchronization inputx, so that when the RTRP executes for the first time, it gets its initial state from the synchronization input.
We 
Such functionality is stored in a configuration memory and can be changed by a control input signal c t that is responsible for the reconfiguration. The control signal c t is responsible for changing both f and g when needed, and this change takes m clock cycles to finish. Finally, the processor can execute for the first time with the new configuration.
A representation of an RTRP is shown in Figure 3 . Feedback loops, along with delay blocks (represented by z −1 ), are used to represent memories following the pattern: the blue rounded delay represents configuration memory, the squared black delay represents RTRP's internal memory, and the dashed delay represents a virtual count down to simulate reconfiguration time. ). Such pair is stored in the configuration memory until a reconfiguration request is received via c t . To represent such behavior, the functionality transition function is given by
To represent the time spent to perform the reconfiguration of a process, the countdown variable m[k] ∈ N stores how many cycles are left to finish the reconfiguration. m[k] > 0 indicates that the process is reconfiguring at the instant k and, therefore, cannot execute. (5) represents the behavior of the countdown signal m.
When a reconfiguration is being performed, i.e. when m > 0, the RTRP outputs the absent value ⊥ for both the next state x[k + 1] and the output y j [k]. The first time the RTRP executes after reconfiguration, it uses the state inputx as initial states. Afterwards, it keeps executing with its internal state x. The state transition function at any instant k is given by
withx[k] being the value of the states stored in CSSM in the instant k. The output of a runtime reconfigurable process at any instant k is given by
Initial values f [0], g[0], m[0] and x[0], indicating the initial configuration and states of each RTRP, must be provided.
Voter
The Voter's task is to compare the outputs of the three RTRPs that are currently active, and alert the Control Device when one of the outputs differs from the other two. Figure 4 shows the voter inputs and outputs.
The input c v is responsible to select the three currently active RTRP, so that the voter can compare their results and, in case Voter
. . . Figure 4 : Voter with inputs and outputs.
of any inconsistency, it informs the Control Device about the failed RTRP through the signal r. The Voter outputs the most occurring of the RTRP results through s out and send the current RTRP state to CSSM via x v . Let s j be the signal that carries the output and the states of RTRP j . Events from c v and s j are defined as follows.
The outputs s out and x v are modeled as follows.
The signal r is used to inform the Control Device, in case of a failure, which RTRP failed. If the results from the three active RTRPs are consistent in instant k, r[k] assumes the absent value, otherwise it assumes the number of the faulted RTRP. Thus, the output r is modeled as follows. 
. . .
. . . c tn Figure 5 : Control Device internal schematics.
follows.
The output c v carries the current three active RTRPs as a tuple, such as in (8) . The behavior of the Control Device regarding the output c v is modeled as follows. 
RTR Modeling with SY
An strategy and comparison of frameworks supporting formal-based development and models of computation is presented by Horita et al. [5] . Based on their result, we opt here for the use of ForSyDe [6] to model our TMR system. As ForSyDe is implemented in Haskell, a functional language, implementing (4) to (15) is considered an easy task, and we one does not need to worry about side effects either. Another advantage of functional languages is that functions can be used as normal data, allowing the exchange of control events such as (f, g, m).
The ForSyDe SY library possesses a collection of process constructors, as well as delays, to implement all the processes presented so far. We use the process constructors combnSY, with n indicating the number of inputs, unzipmSY, with m indicating the number of outputs, and delaySY.
Listing 1 shows the ForSyDe implementation of an RTRP process, where rtrpFunc is a Haskell function that implements (4) to (7) . For this implementation, we consider an architecture with 5 RTRPs (three initially operating RTRPs and two spare ones). As mentioned in Section 3. In a similar way, Listing 2 shows the ForSyDe implementation of the Voter process, where voterFunc is a Haskell function that implements (10) to (11) . ( s_out , x , r ) = voter cv ' out1 out2 out3 out4 out5 8
x ' = delaySY ( Prst 0) x 9
( cv , ( ct1 , ct2 , ct3 , ct4 , ct5 ) ) = ctrlDev r 10 cv ' = delaySY (1 ,2 ,3) cv 11 ct1 ' = delaySY Abst ct1 12 ct2 ' = delaySY Abst ct2 13 ct3 ' = delaySY Abst ct3 14 ct4 ' = delaySY Abst ct4 15 ct5 ' = delaySY Abst ct5
Simulation Results
To simulate the TMR architecture, we first need to define the functionalities of each RTRP. The first three RTRPs are implemented to behave as accumulators, i.e. each input is added to the result of the previous execution. Functions f and g, from (16) and (17), are used to implement such accumulator. To simulate a failure in one of these three RTRPs (in this case, we chose to be RTRP 2 ) we implemented a faulted accumulator, replacing f forf given by (18), in which when the result of the previous execution is 3, instead of adding the input to it, it will subtract. RTRPs 4 and 5 are implemented usingf given by (19) . We assume that it takes 2 clock cycles to reconfigure a new RTRP, i.e. m = 2. 
Related Work
The idea of using triple modular redundancy with runtime reconfiguration is not new. SRAM field programmable gate arrays (FPGAs) must protect its configuration memory from SEUs, and TMR techniques are applied to such devices. However, when a majority voter is fed with two wrong answers, possibly caused by multiple independent SEUs, it produces the wrong result. One way to solve this issue is to periodically write back the whole bistream of each module, which is time consuming and leaves the modules inactive during this period. [7] proposes an optimization of the reconfiguration time in order to cope with this problem.
Another application of TMR using RTR is presented by [8] , where an adaptive reconfigurable voting mechanism whose main goal is to extend the dynamic and partial reconfiguration SEU mitigation to the voter, which is usually the single point of failure in TMR architectures.
A novel technique for synchronizing the states of a newly reconfigured module is presented in [9] . Such technique consists on predicting the future state to which the system will soon converge (check point state) and presetting the reconfigured module to it. Therefore, only the reconfigured module will be set on hold until the check-point state is reached.
The research introduced in [10] claims an improvement of fault resilience, on up to 80%, by composing and applying space and time redundancy, i.e. multiprocessors and scheduling, with task migration among processors in hard real-time systems design. That architecture follows the multiple instruction, multiple data (MIMD) taxonomy, as proposed by [11] .
Conclusion
This paper introduced a high level abstraction architecture for safety-critical systems with runtime reconfiguration (RTR) using the triple modular redundancy and the synchronous (SY) model of computation (MoC).
R. Bonna
Triple modular redundency by reconfiguration DOI
10.3384/ecp19162016
Proceedings of the 10th Aerospace Technology Congress October 8-9, 2019, Stockholm, Sweden
The triple modular redundancy was chosen to be the mechanism for detecting and masking faults. While the triple modular redundancy is a classic way to implement fail mitigation in safety-critical systems, in the event of a permanent fault, the system can mask such fault. However it gets vulnerable to a second fault.
A triple modular redundancy using RTR provides a way for the system to circumvent failures in the presence of multiple permanent faults, provided that no "two faults" happen in a time interval defined by the reconfiguration time of a new module.
We implemented the proposed high level architecture model in the framework ForSyDe and verified that a new RTRP can be correctly reconfigured in m cycles and can have its states synchronized with the other two RTRPs.
