48,296 research outputs found
Automated Dynamic Error Analysis Methods for Optimization of Computer Arithmetic Systems
Computer arithmetic is one of the more important topics within computer science and engineering. The earliest implementations of computer systems were designed to perform arithmetic operations and cost if not all digital systems will be required to perform some sort of arithmetic as part of their normal operations. This reliance on the arithmetic operations of computers means the accurate representation of real numbers within digital systems is vital, and an understanding of how these systems are implemented and their possible drawbacks is essential in order to design and implement modern high performance systems. At present the most widely implemented system for computer arithmetic is the IEEE754 Floating Point system, while this system is deemed to the be the best available implementation it has several features that can result in serious errors of computation if not implemented correctly. Lack of understanding of these errors and their effects has led to real world disasters in the past on several occasions. Systems for the detection of these errors are highly important and fast, efficient and easy to use implementations of these detection systems is a high priority. Detection of floating point rounding errors normally requires run-time analysis in order to be effective. Several systems have been proposed for the analysis of floating point arithmetic including Interval Arithmetic, Affine Arithmetic and Monte Carlo Arithmetic. While these systems have been well studied using theoretical and software based approaches, implementation of systems that can be applied to real world situations has been limited due to issues with implementation, performance and scalability. The majority of implementations have been software based and have not taken advantage of the performance gains associated with hardware accelerated computer arithmetic systems. This is especially problematic when it is considered that systems requiring high accuracy will often require high performance. The aim of this thesis and associated research is to increase understanding of error and error analysis methods through the development of easy to use and easy to understand implementations of these techniques
Automated Dynamic Error Analysis Methods for Optimization of Computer Arithmetic Systems
Computer arithmetic is one of the more important topics within computer science and engineering. The earliest implementations of computer systems were designed to perform arithmetic operations and cost if not all digital systems will be required to perform some sort of arithmetic as part of their normal operations. This reliance on the arithmetic operations of computers means the accurate representation of real numbers within digital systems is vital, and an understanding of how these systems are implemented and their possible drawbacks is essential in order to design and implement modern high performance systems. At present the most widely implemented system for computer arithmetic is the IEEE754 Floating Point system, while this system is deemed to the be the best available implementation it has several features that can result in serious errors of computation if not implemented correctly. Lack of understanding of these errors and their effects has led to real world disasters in the past on several occasions. Systems for the detection of these errors are highly important and fast, efficient and easy to use implementations of these detection systems is a high priority. Detection of floating point rounding errors normally requires run-time analysis in order to be effective. Several systems have been proposed for the analysis of floating point arithmetic including Interval Arithmetic, Affine Arithmetic and Monte Carlo Arithmetic. While these systems have been well studied using theoretical and software based approaches, implementation of systems that can be applied to real world situations has been limited due to issues with implementation, performance and scalability. The majority of implementations have been software based and have not taken advantage of the performance gains associated with hardware accelerated computer arithmetic systems. This is especially problematic when it is considered that systems requiring high accuracy will often require high performance. The aim of this thesis and associated research is to increase understanding of error and error analysis methods through the development of easy to use and easy to understand implementations of these techniques
Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators
In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications’ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude.
We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures.This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research
project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement
n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA.Postprint (author's final draft
Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement
A new technique is proposed for fault-tolerant linear, sesquilinear and
bijective (LSB) operations on integer data streams (), such as:
scaling, additions/subtractions, inner or outer vector products, permutations
and convolutions. In the proposed method, the input integer data streams
are linearly superimposed to form numerically-entangled integer data
streams that are stored in-place of the original inputs. A series of LSB
operations can then be performed directly using these entangled data streams.
The results are extracted from the entangled output streams by additions
and arithmetic shifts. Any soft errors affecting any single disentangled output
stream are guaranteed to be detectable via a specific post-computation
reliability check. In addition, when utilizing a separate processor core for
each of the streams, the proposed approach can recover all outputs after
any single fail-stop failure. Importantly, unlike algorithm-based fault
tolerance (ABFT) methods, the number of operations required for the
entanglement, extraction and validation of the results is linearly related to
the number of the inputs and does not depend on the complexity of the performed
LSB operations. We have validated our proposal in an Intel processor (Haswell
architecture with AVX2 support) via fast Fourier transforms, circular
convolutions, and matrix multiplication operations. Our analysis and
experiments reveal that the proposed approach incurs between to
reduction in processing throughput for a wide variety of LSB operations. This
overhead is 5 to 1000 times smaller than that of the equivalent ABFT method
that uses a checksum stream. Thus, our proposal can be used in fault-generating
processor hardware or safety-critical applications, where high reliability is
required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201
Profile-directed specialisation of custom floating-point hardware
We present a methodology for generating
floating-point arithmetic hardware
designs which are, for suitable applications, much reduced in size, while still
retaining performance and IEEE-754 compliance. Our system uses three
key parts: a profiling tool, a set of customisable
floating-point units and a
selection of system integration methods.
We use a profiling tool for
floating-point behaviour to identify arithmetic
operations where fundamental elements of IEEE-754
floating-point may be
compromised, without generating erroneous results in the common case.
In the uncommon case, we use simple detection logic to determine when
operands lie outside the range of capabilities of the optimised hardware.
Out-of-range operations are handled by a separate, fully capable,
floatingpoint
implementation, either on-chip or by returning calculations to a host
processor. We present methods of system integration to achieve this errorcorrection.
Thus the system suffers no compromise in IEEE-754 compliance,
even when the synthesised hardware would generate erroneous results.
In particular, we identify from input operands the shift amounts required
for input operand alignment and post-operation normalisation. For operations
where these are small, we synthesise hardware with reduced-size
barrel-shifters. We also propose optimisations to take advantage of other
profile-exposed behaviours, including removing the hardware required to
swap operands in a floating-point adder or subtractor, and reducing the
exponent range to fit observed values.
We present profiling results for a range of applications, including a selection
of computational science programs, Spec FP 95 benchmarks and the
FFMPEG media processing tool, indicating which would be amenable to
our method. Selected applications which demonstrate potential for optimisation
are then taken through to a hardware implementation. We show up
to a 45% decrease in hardware size for a
floating-point datapath, with a
correctable error-rate of less then 3%, even with non-profiled datasets
- …