Abstract-Commercial microprocessors could be useful computational platforms in space systems, as long as the risk is bound. Many spacecraft are computationally constrained because all of the computation is done on a single radiationhardened microprocessor. It is possible that a commercial microprocessor could be used for configuration, monitoring and background tasks that are not mission critical. Most commercial microprocessors are affected by radiation, including single-event effects (SEEs) that could be destructive to the component or corrupt the data. Part screening can help designers avoid components with destructive failure modes, and mitigation can suppress data corruption. We have been experimenting with a method for masking radiation-induced faults through the software executing on the microprocessor. While triple-modular redundancy (TMR) techniques are very effective at masking faults in software, the increased amount of execution time to complete the computation is not desirable. In this paper we present a technique for combining duplication with compare (DWC) with TMR that decreases observable errors by as much as 145 times with only a 2.35 time decrease in performance.
Robust Duplication With Comparison Methods in Microcontrollers
Heather Quinn, Zachary Baker, Tom Fairbanks, Justin L. Tripp, and George Duran
Abstract-Commercial microprocessors could be useful computational platforms in space systems, as long as the risk is bound. Many spacecraft are computationally constrained because all of the computation is done on a single radiationhardened microprocessor. It is possible that a commercial microprocessor could be used for configuration, monitoring and background tasks that are not mission critical. Most commercial microprocessors are affected by radiation, including single-event effects (SEEs) that could be destructive to the component or corrupt the data. Part screening can help designers avoid components with destructive failure modes, and mitigation can suppress data corruption. We have been experimenting with a method for masking radiation-induced faults through the software executing on the microprocessor. While triple-modular redundancy (TMR) techniques are very effective at masking faults in software, the increased amount of execution time to complete the computation is not desirable. In this paper we present a technique for combining duplication with compare (DWC) with TMR that decreases observable errors by as much as 145 times with only a 2.35 time decrease in performance.
Index Terms-Soft errors, software, software fault diagnosis, software fault tolerance.
I. INTRODUCTION

C
OMMERCIALLY available microprocessors could be useful to spacecraft, as these components are cheaper, smaller, faster and take less power than traditional radiationhardened microprocessors. Los Alamos National Laboratory (LANL) is particularly interested in microcontrollers that are capable of self-hosting, so that it is unnecessary to command the microprocessor to start execution. Most microcontrollers are sensitive to SEEs. Single-event upsets (SEUs) and single-event transients (SETs) in the hardware can cause software malfunctions, including both incorrect outputs and crashes [1] - [3] . While not appropriate for many missioncritical computation tasks, these components can be useful for non-mission critical computation tasks, such as configuration, monitoring and background tasks.
In recent years, we have been studying whether SEUs and SETs can be masked through software resilience methods. The LANL method and automated tool, called Trikaya, originally looked at triple-modular redundancy (TMR) methods [2] , which is based on von Neumann's methodology [4] . The application of TMR to software triplicates the execution of the sub-routines three times with independent input and output variables. The output of the triplicated sub-routine is determined by majority vote, which means that one faulty output can be removed per vote. Previous test results on the TMR method show that the TMR-mitigated sub-routines are effective at masking faults when compared to unmitigated subroutines. The cost of increasing the resilience of the software is a decrease in the performance. The performance, as measured by the amount of time to complete the sub-routine, increases by 1.34-983.22 times [2] . During our previous experiments we found that even in the accelerated radiation environment that the SEU rate was much slower than the execution time of the algorithms. This means that the average use case for the mitigated algorithms was to execute without any faults. Therefore, we would like to optimize the mitigation technique to not lose as much performance in the average case.
One option is to look at application-based fault tolerance codes (ABFT), which is commonly applied to matrix operations. One ABFT technique places checksums on the rows and columns on matrices [5] . Others have created ABFT methods for sparse [6] or dense [7] matrices. It is possible to use ABFT techniques on other types of algorithms. In [8] ABFT is applied to dynamic programming algorithms. A novel system by [9] takes the standard self-adapting software system framework and adapts it for resilience so the system monitors itself for changes. When the system finds non-operational or faulty modules, a plan for working around the faulty portions can be developed and executed by the system. All of these options are reasonable, but expert knowledge of the algorithm is often necessary to achieve good results with ABFT.
There are a number of methods for mitigating data integrity or control flow errors through low-level, rule-based transformations. Rule-based transformations are used to prevent control flow errors [10] . In [11] , test assertions are used to validate transitions between blocks. Other transformations can insert low-level duplication with compare (DWC) instructions to protect data variables from SEUs [12] . In [13] , [14] , a hybrid method for Software-Implemented Hardware Fault Tolerance (SIHFT) technique is implemented using rule-based transformations and boundary scan. This method is designed to reduce the overhead of software mitigation by increasing fault coverage through the boundary scan. In [15] , the authors warn that these types of transformations can have limitations. These authors determined that transformations based on DWC and inverted branch techniques are very effective, but that software signatures for control flow errors are not.
DWC or duplex techniques are attractive because they execute faster than TMR, as the algorithm is executed only twice [16] . Like TMR, DWC can be applied to a variety of algorithms without expert knowledge of the algorithm or the data. We should be able to automate the application of the technique in the Trikaya tool. Unfortunately, DWC can only detect errors and has no capability for error correction. We are interested in a more robust variation of DWC that reverts to TMR when a fault is detected. For simplicity, we are calling this method DWCF to distinguish it from traditional DWC methods. With the ability to execute a lazy TMR, the DWCF algorithm should provide the robustness of TMR without the performance loss of TMR.
This paper is organized as follows. The DWCF technique is presented in Section II. Limitations to the technique are discussed in Section III. Experimental setup the performance and radiation tests is presented in Section IV. The results of the tests are presented in Sections V and VI.
II. DWCF TECHNIQUE
The Trikaya technique is based on spatial and temporal redundancy. The calculation is made spatially redundant by replicating the sub-routine's variables and is made temporally redundant by replicating the execution of the mitigated sub-routine. Executing the sub-routine multiple times with independent data can mask SEUs and SETs. An overview of how the DWCF technique transforms the unmitigated code is shown in Fig. 1 . If there are no faults, the replicated sub-routine is executed twice with two independent data replicas. A third replica of the data is stored in case of an error. If the two outputs do not match, then the sub-routine is executed a third time with the third replica of the data variables and a software voter is used to correct the error.
The mitigation technique makes a number a number changes to the code:
• Insertion of the replicated input and output variables;
• Insertion of the comparison code; • Insertion of the majority voter code; and • Insertion of the code to trigger the DWCF algorithm.
Issues with code structure are not addressed when mitigating full sub-routines, so issues with incorrect data causing problems with looping, branching or jumping are not addressed. One of the key program changes is the insertion of the comparison. The comparison provides a high-performance method for detecting errors. A C-language version of the software comparison is shown in Fig. 2 . The comparison algorithm is not designed to find all of the faults, but to quickly return on the detection of any faults. The comparison code is more efficient than the software voter, because two-way comparison can be done in a single instruction, if the values are loaded in registers. Some microcontrollers also have hardware comparison units that can help speed up the comparison code. The return codes are used to flag error states.
The return codes on the comparison code are monitored in the DWCF algorithm to determine when it is necessary to fail over to TMR. Fig. 3 shows what the voter is like in C. The voter determines where the fault is in the output value, determines what the majority value is, and corrects the faulty value. It takes a minimum of eight instructions to detect and correct a fault, assuming that the values are in registers. Finally, like the comparison code, the voter uses return codes to flag error states. Fig. 4 shows the DWCF algorithm for a generic sub-routine in C. This figure shows how the input and output variables are triplicated, how the compare code is executed, and how execution of the third sub-routine is triggered on error. It shows how the error states for the comparison and voter code are monitored to trigger the appropriate action. While the comparison error flag triggers a third execution of the sub-routine and 
III. LIMITATIONS TO THE DWC TECHNIQUE
There are limitations to the DWCF technique, due to multiple-independent upsets (MIUs), and latent faults. When two or more faults occur in the execution of the mitigated sub-routine, it is possible that the voter cannot correct the faults. Latent faults are a special case of MIUs, where one of the faults is in the third replica's input. The DWCF technique is not guaranteed to work under these circumstances, but can work for many MIU and latent fault scenarios.
An accumulation of faults can cause the lazy TMR execution to fail. In this scenario, a second fault in the same variable must occur before the first fault is corrected. Because MIUs are dependent on having two faults in the system at one time, there are three factors that affect the probability of failure: flux, execution time for the sub-routine and the size of the input/output variables. When the flux is high and the size of the input/output variables is large, then the SEU rate for the mitigated sub-routine might be faster than the execution rate. In this case, it is possible to accumulate two faults during the execution of the mitigated sub-routine. The equation for the probability of an MIU is:
where each redundant variable has n words in a memory of m words. This equation is a worst-case scenario. The voter removes faults on a word-by-word basis, so the same word has be affected for the voter to return an error state. Therefore, the best case scenario is when each word in the output fails independently, which means that n = 1 in Equation 1. As each algorithm is different, it depends on the algorithm whether the fault causes an isolated, single-word error or a wide-spread, multiple-word error.
There are specific use cases that cause latent faults in the DWCF technique. It is possible with the DWCF technique that the input variables for the third replica persist for longer than one execution of the mitigated sub-routine. The output variables for the third replica are not a concern, because these values are overwritten when the third sub-routine executes. In this case, any faults in the third replica would not be detected until a fault in the other replica causes the third subroutine and voter to execute. On average a third of all SEUs in the replicated input data variables affect the third replica. If the input data in the third replica does not change or is not refreshed frequently, then it is possible that faults could accumulate in the input data. The equation below provides a lower bound for latent faults:
where each redundant variable has n words in a memory of m words. As with MIUs, Equation 2 is a worst case scenario and the best case scenario is when n = 1. Likewise, the implementation of the algorithm determines how faults translate into errors in the system. Fig 5 shows how the two worst-case failure rates and the probability of a single upset in a variable scales with the number of words in the variable. While the failure rate of single variables increases linearly with a slope that is three times the amount of memory used, the probability of having a TMR or DWC failure is parabolic but the quadratic coefficient is very small.
Because of issues with latent faults and MIUs, the error state on the voter should be monitored. How to handle issues with MIUs and latent faults is dependent on how often either is expected to occur. If the probability of latent faults and MIUs is low, then it might be reasonable to not use the output from the failed execution and restart the execution with new data. If the probability of latent faults and MIUs is high, then issues with the sub-routine must be addressed. The sub-routine could be re-written as several shorter sub-routines that are guaranteed to complete quicker, so that the DWCF technique is more effective when applied to each of the shorter subroutines individually. To reduce latent faults, the input data in the third replica needs to be updated more frequently so faults cannot accumulate before it is needed. It is also possible to vote the input data on the third replica before executing the sub-routine for the third time to detect and correct any possible latent faults.
IV. EXPERIMENTAL SETUP
To demonstrate this technique, we are focused on radiation and performance testing of the hand-mitigated version of the DWCF technique. In November 2015 and February 2016, we tested the Trikaya technique on four components: TI flash-based MSP430 (MSP430F2619), TI FeRAM-based MSP430 (MSP430FR5969), TI Tiva Cortex-M4F, and Xilinx Zynq-7000 SoC FPGA with Cortex-A9. All of the radiation results were collected at the Los Alamos Neutron Science Center (LANSCE) Irradiation of Chips and Electronics (ICE) House I flight paths. A picture of the test setup at LANSCE is shown in Fig 6. We also have executed performance measurements for the same components on the bench. In this section we will describe the test setups for the radiation and performance tests.
A. Radiation Experiment Test Setup
The radiation test experiments includes hardware and software test fixtures for all four components. The software test fixture leverages the benchmark for radiation testing, which has been improved since last year [3] . Each component uses the same design of experiment. These three elements are described below.
1) Hardware Test Fixture:
The hardware setup is the same as the setup we used in previous microcontroller tests [2] . The test board communicates with the test computer through a serial connection for test reporting and a JTAG connection for programming. The JTAG programmer is connected to the test board and can write directly to the component's SRAM or non-volatile memory. The JTAG programmer for the TI components writes to the non-volatile memory and the components self-boot when power cycled or reset. The JTAG programmer for the Zynq component writes to the SRAM memory and the component has to be programmed when power cycled or reset. Programming the codes to the memory is controlled by instrumentation software on the test computer. The boards are independently powered at nominal voltages, are at nominal temperature, and are at a normal incidence to the beam.
2) Software Test Fixture: We implement these codes from the benchmark: AES-128 with NIST test vectors, Cache Test, piFFT, Matrix Multiply (M×M) and Quicksort (Qsort). The MSP430F2619 version of these codes can be found on GitHub [17] . There have been several changes to the benchmark in recent months. These changes include adding new codes, changing the error checkers and standardizing the input value sizes.
The benchmark now includes the piFFT code. The piFFT code uses Takuya Ooura's C code for calculating Pi using fast Fourier transforms (FFT) [18] . The addition of an FFT code with standardized input variables is a good addition to the benchmark, as FFT is a common type of calculation to complete on microcontrollers. Unlike most of the benchmark codes, the piFFT code only has pointers to the input and output variables, which are then allocated on the heap. Finally, it is possible to customize the code to many different types of microprocessors. The heap size grows with the number of digits calculated, so it is possible to scale the code for different classes of microprocessors based on the number of digits.
We also updated the codes to have reduced printing. While printing provides an ever-important "heartbeat" to tests that crash frequently, printing needs to not overwhelm the computational capacity. The new reporting infrastructure only takes 1% of the execution time.
We also updated the codes to have improved error checking. In the previous implementation of the benchmark, the output values are compressed into a 32-bit cyclic redundancy check (CRC) value, which is checked for correctness. The updated codes keep a golden copy of the output, which are compared word for word. This comparison is 50% faster than the CRC, and provides more information about how the software is failing.
We have also normalized the input and output variable sizes based on microprocessor class (PIC, microcontroller, singlecore general purpose microprocessor, multiple-core generalpurpose processor, graphics processing unit). The number of words of the input and output variables are listed in Table I . The Tiva and the Zynq have similar cache sizes and it is possible to create codes with the same input variable sizes. The MSP430F2619 has nearly the same amount of SRAM as the TI Tiva, and uses input variable sizes that are nearly the same size as used with the TI Tiva. The MSP430FR5969 has significantly less SRAM, so we used smaller input variables on this part. The output variables follow similar trends. It is also clear that the output variables are highly variable from code to code. Cache test reduces all of the input data down into one variable, whereas quicksort does an in-place sort of the TABLE I   THE TABLE LISTS THE NUMBER OF WORDS IN THE INPUT AND  OUTPUT VARIABLES IN THE FORMAT "INPUT/OUTPUT" input array so the input and output sizes are the same. These differences are important. Any fault in cache test causes the output to be wrong, whereas in quicksort some of the output array might be correct. The TI codes are compiled in Code Composer Studio version 6.1.1. using "-O1" optimization. The Xilinx codes are compiled in the Xilinx Software Development Kit version 14.4 using "-O1" optimization. There are two versions of the code for the Cortex-A9: one that targets the L1 cache and one that targets the on-chip memory (OCM). The codes are identical, except the cache initialization code that is removed for the L1 tests.
3) Design of Experiment:
The statistical design of the test has a Latin Squares construct [19] . This construct is useful for tests where a single system is tested under a variety of conditions that could have nuisance factors, such as variations in output caused by the order of the tests or variations in radiation sensitivities caused by total ionizing dose or displacement damage. The test methodology allows for an analysis of variance to be performed. This particular methodology is well suited for neutron tests, where each run can be long but individual tests are short. Because the methodology is designed to switch from one test to the next, it is possible to get an equivalent amount of radiation exposure for each test condition. The Latin Squares setup is implemented in a python script and includes the code to program each component with the next test.
B. Performance Tests
Performance testing is completed on the bench to determine the effect of the mitigation process on execution speed, power consumption and program size. All of the algorithms are measured for changes in overhead. The execution time is determined by the number of sub-routines executed within two minutes. The current consumption is measured on a programmable power supply with the exception of the Zynq. The Zynq test fixture uses an evaluation board that does not allow independent biasing of each power rail, so it is not possible to measure the power consumption. The changes to program sizes are determined by examining the programs for the compiled number of bytes: the data, const and bss sections for the variables; and the text section for the instructions.
V. RADIATION RESULTS
In this section we discuss the results of radiation tests, the efficacy of the mitigation technique and the root causes of software failures based on the algorithm. The advantage of the Latin Squares methodology is that it load balances the amount of fluence. Therefore, the fluence for each test on each component is roughly the same: 2.5 × 10 10 neutrons cm 2 . On top of it, by using similar input and output sizes each test had roughly the same number of faults. The unmitigated codes had between 0-20 faults and the mitigated codes had 20-40 faults. The only notable exception is the AES code on the Xilinx Zynq Cortex-A9, which is very fault sensitive due to storing the test vectors in SRAM.
Results from these tests are shown in Figs 7 to 10. These graphs show the cross sections for both errors and faults, where errors are the fraction of faults not corrected by the mitigation method. By plotting both faults and errors, it is possible to see two aspects: the increase in faults caused by triplicating the data variables and the effectiveness of the mitigation technique. Many of the tests ended with no faults or errors. In these causes, we placed the data point for the null cross section at 1 f luence with the 95% confidence intervals for a null cross section. Because the lower error bar for a null cross section is zero, the lower error bars have a clipped arrow.
A. Efficacy of the DWCF Technique
It is always possible to increase the sensitivity to output errors when mitigation is applied to any system. In modularredundancy-based systems, the system grows by the number of Cross Sections with 95% confidence intervals for the Zynq Cortex-A9. modules (n), so the mitigated system has an overall failure rate of 3n. For example, the DWCF technique triplicates the input variables, which causes the SEU rate in the input variables to triple. Because of the increase in size, it is possible that the mitigated system has a higher error rate than the unmitigated system, which is why testing is such a key part of the process. During our tests, there is no indication that the cross section increases from the unmitigated codes to the mitigated codes, which means that the mitigation process is able to handle the increased fault rate and that the mitigation technique is decreasing the error rate.
In all cases the mitigated cross section is between 1-145 times smaller than the unmitigated cross sections. The situation where the mitigated and unmitigated cross sections are the same is a special case: both cross sections are null. In most other cases, the unmitigated cross section is not null and the mitigated cross section is null. Comparing cross sections where one is null is difficult. We compared the known cross section to 1 f luence for the null cross section. It is possible the actual cross section for the mitigated cross sections is smaller than 1 f luence and that the difference between the mitigated and unmitigated cross section is larger. It was not possible to measure the mitigated cross section for most of the algorithms, even after several days of testing.
These results show that the DWCF technique works for many algorithms. For the quicksort and matrix multiply on all components, DWCF is able to detect and correct all of the faults, leading to null error cross sections. The AES and piFFT codes are naturally resilient to many faults on many of the components and most of the cross sections are null for all implementations. There is also no evidence of MIUs in any of the codes.
The mitigated version of the cache test had latent faults on two components. The cache test is particularly hard to mitigate well. Unlike the other codes, the input data is written to memory once at the beginning of the 10-minute test. As the input variable size for unmitigated cache test is 1/3 of the SRAM size and the data do not refresh, the probability of a latent fault is more likely in cache test than the other tests. On top it, every single input word is used to calculate a one-word output, so every fault translates into an error with this algorithm. In testing, we find that approximately 10-23% of the faults in the DWCF-mitigated version of the cache test code could not be corrected because of a latent fault in the third data replica. Equation 1 predicts 11% of tests run on the MSP430FR5969 cache code should have DWCF failures. We tested two MSP430FR5969 parts: one had zero DWCF failures and one had 12% of the cache tests fail. In comparison, the two Tiva parts are predicted to have 1% of cache tests fail, whereas one had a 7% DWCF failure rate and the other had a 23% DWCF failure rate. We believe that these issues are unique to the cache test and that latent faults will be uncommon for most applications in deployed environments. Firstly, accelerated radiation tests have higher fluxes than the natural environment. Secondly, most algorithms will naturally refresh the third replica's input data more frequently.
B. Root Causes for Failures
We are interested in how software fails, because it may lead to future mitigation possibilities. We specifically look at quicksort, matrix multiply and piFFT, as the algorithms are useful for many real applications.
We redesigned the quicksort test to allow us to measure whether sorting a sorted array would have a different probability of failure than sorting an unsorted array. The algorithm does two forward sorts followed by two reverse sorts, which causes two full sorts and two non-sorts. The number of faults for all four sorts is approximately equal, which indicates that there there is no difference between full sorts and non-sorts. The real difference is the location of the SEU within the affected word. When the SEU affects the most significant bit, it is possible than the entire array is in the wrong location. On the other hand, SEUs in the least significant bit might not affect the rest of the array: the wrong value and the correct value are in the same position in the array. It is possible that these faults are not in the input data, but originate in the output array after the sort is completed. Therefore, the effect on the code is dependent on location within the word and the timing of when the SEU occurs. Even with these issues, the arrays were sorted properly and generally only had one erroneous value in the array. It is possible for some applications that these issues are reasonable and it is unnecessary to protect the sort with any mitigation method.
Matrix multiply had similar results to quicksort. While it is possible that each SEU could cause the resultant matrix to have an entire row or column of faults that happens only half of the time. The other half of the time, the SEU occurs in the resultant matrix and causes only a single fault. It should be noted that the resultant matrix is the same size as the two input matrices combined, because the resultant uses 64-bit integers and the inputs use 32-bit integers. Therefore, SEUs are equally likely in the output matrix as SEUs in the two input matrices. We also see cases where the SEU occurs during the calculation, causing partial failures of a column or row in the resultant matrix. It should be noted that the effect of the SEUs on the input matrices has larger consequences than with quicksort, because half of the failures translate into several errors. While we would strongly suggest mitigating this type of algorithm, an ABFT algorithm might be straightforward and lighter weight to implement for this particular algorithm.
While the piFFT code has very few failures, the two most common are issues with the heap and issues with converging. The most common failure mode is malloc() failures when the input variables are being allocated on the heap. These malloc() failures indicate that it is possible to corrupt the heap such that it is not possible to instantiate new variables. Finally, as the code is an iterative code, a number of tests ended when the algorithm determined it is not converging. The problems with both convergence and the heap rarely affected multiple executions of the algorithm, though. Even when the first execution of the sub-routine had issues with the heap, the second would usually not have issues with the heap. Therefore, it might be possible to mitigate the allocation of variables on the heap separately, by executing the allocation until it completes correctly. Furthermore, it might be possible to mitigate the entire algorithm by executing it repeatedly until two to three executions complete without heap or convergence issues. 
VI. PERFORMANCE TESTS AND ANALYSIS
Performance testing is a key part of our experimental strategy, as we are redesigning the mitigation algorithm to increase the performance of the mitigated codes. In theory, we expect that the increases in overhead will be based on these values:
• Instructions: The number of words for the DWCF algorithm, comparison and voter code, • Variables: Three times the number of words for the input and output variables for the unmitigated sub-routine, • Execution Time: Two times the execution time for the sub-routine and the execution time for the comparison for error-free computation; and three times the execution time for the sub-routine, the execution time for the comparison and the execution time for error-correction computation. Table II lists the comparison between the DWCF-mitigated algorithm and the original, unmitigated algorithm for all of these parameters. In practice, some of these values are highly dependent on the microcontroller and the software. We provide a detailed discussion of the overhead for the hardware and software we implemented in the remainder of this section.
In practice, the impact of triplicating just the input and output variables is less than triplicating all of the variables. When all of the data sections (bss, data, const) are taken into account, the increase in the amount of memory used to store the variables is 1.9 times on average. While it is possible that the amount of memory needed for variables increases by three times, in most cases that is not true. Only the input and output variables for the mitigated sub-routine are triplicated, which might be a large or small fraction of the memory used for variables in the entire program. Therefore, the amount of memory being triplicated can vary. For example, the AES code uses a lot of memory space for the test vectors and there is only a 1% increase in the amount of memory needed for variables for the mitigated program. The cache test, quicksort and matrix multiply programs are all dominated by large data variables, and the data section increases by 2.30 to 2.9 times with mitigation. The piFFT code uses several large data variables, but all of these values are instantiated on the heap within the sub-routine. There are no input variables and only one output variable in the unmitigated routine, which means that there is a less than 10% increase in the data sections.
Because the sub-routine is reused for all three possible executions, the increase in the text section, where the instructions are defined, should be small. The relative increase is dependent on how many instructions were in the original, unmitigated program. On average the text section increases by 50%.
We find that the overhead for the instructions and variables is dependent on the compiler. The Xilinx SDK compiler includes several runtime libraries, so the overhead associated with mitigating the code and the variables is 10% or less. The inclusion of the runtime libraries by the Xilinx SDK compiler creates text sections that are 30-40 times larger than the TI text section. This same effect is not seen in the TI microcontrollers, including the TI Tiva Cortex-M4F.
The execution time increases by an average of 2.4 times from the unmitigated sub-routine. For most of the programs, the execution increases by 2.0-2.3 times. Three implementations of the cache test increased by 3.5 times. The unmitigated version of cache test is quite fast without any error checking and even the addition of an if statement to check for SEUs in the input array causes the code to execute 50% slower. The effect of the comparison on the execution time is dependent on the execution time of the sub-routine. For the three algorithms that execute the slowest (piFFT, matrix multiply and quicksort), the comparison code adds only 0-5% more overhead. In the faster codes, though, the comparison code adds 5-25% more overhead.
Finally, we analyzed the power consumption of some of Texas Instruments components, which are independently biased through programmable power supplies that allow for current monitoring. For the MSP430F2619 the power consumption is 0.005W. The MSP430FR5969 consumed 0.008W. The Tiva consumed between 0.066-0.069W. With these code implementations, there are no algorithmic differences in the power consumption. Therefore, neither the different codes or mitigation affects power consumption. In the past, we have seen differences in the power consumption, but we suspect those issues stem from printing. With the reduced printing, we see more consistent power consumption across the set of codes for each component.
VII. CONCLUSIONS
Commercially available microprocessors have many advantages for modern spacecraft, as the components are smaller and less expensive than their radiation-hardened counterpoints.
SEEs can cause these microprocessors to fail in harsh radiation environments, including incorrect outputs and crashes. The DWCF technique is designed to mask the effect of SEUs and SETs in microprocessor systems. The DWCF algorithm is able to execute in a duplex mode until a fault is detected, then transition into a TMR mode to correct the fault. Test results that show that the DWCF method is effective in decreasing corrupted computations by 1-145 times with a decrease in performance of only 2.35 times.
