A new technique is proposed for fault-tolerant linear, sesquilinear and
bijective (LSB) operations on M integer data streams (M≥3), such as:
scaling, additions/subtractions, inner or outer vector products, permutations
and convolutions. In the proposed method, the M input integer data streams
are linearly superimposed to form M numerically-entangled integer data
streams that are stored in-place of the original inputs. A series of LSB
operations can then be performed directly using these entangled data streams.
The results are extracted from the M entangled output streams by additions
and arithmetic shifts. Any soft errors affecting any single disentangled output
stream are guaranteed to be detectable via a specific post-computation
reliability check. In addition, when utilizing a separate processor core for
each of the M streams, the proposed approach can recover all outputs after
any single fail-stop failure. Importantly, unlike algorithm-based fault
tolerance (ABFT) methods, the number of operations required for the
entanglement, extraction and validation of the results is linearly related to
the number of the inputs and does not depend on the complexity of the performed
LSB operations. We have validated our proposal in an Intel processor (Haswell
architecture with AVX2 support) via fast Fourier transforms, circular
convolutions, and matrix multiplication operations. Our analysis and
experiments reveal that the proposed approach incurs between 0.03% to 7%
reduction in processing throughput for a wide variety of LSB operations. This
overhead is 5 to 1000 times smaller than that of the equivalent ABFT method
that uses a checksum stream. Thus, our proposal can be used in fault-generating
processor hardware or safety-critical applications, where high reliability is
required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201