884 research outputs found
Paranoia.Ada: A diagnostic program to evaluate Ada floating-point arithmetic
Many essential software functions in the mission critical computer resource application domain depend on floating point arithmetic. Numerically intensive functions associated with the Space Station project, such as emphemeris generation or the implementation of Kalman filters, are likely to employ the floating point facilities of Ada. Paranoia.Ada appears to be a valuabe program to insure that Ada environments and their underlying hardware exhibit the precision and correctness required to satisfy mission computational requirements. As a diagnostic tool, Paranoia.Ada reveals many essential characteristics of an Ada floating point implementation. Equipped with such knowledge, programmers need not tremble before the complex task of floating point computation
Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision
Modern graphics processing units (GPUs) provide impressive computing
resources, which can be accessed conveniently through the CUDA programming
interface. We describe how GPUs can be used to considerably speed up molecular
dynamics (MD) simulations for system sizes ranging up to about 1 million
particles. Particular emphasis is put on the numerical long-time stability in
terms of energy and momentum conservation, and caveats on limited
floating-point precision are issued. Strict energy conservation over 10^8 MD
steps is obtained by double-single emulation of the floating-point arithmetic
in accuracy-critical parts of the algorithm. For the slow dynamics of a
supercooled binary Lennard-Jones mixture, we demonstrate that the use of
single-floating point precision may result in quantitatively and even
physically wrong results. For simulations of a Lennard-Jones fluid, the
described implementation shows speedup factors of up to 80 compared to a serial
implementation for the CPU, and a single GPU was found to compare with a
parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package
licensed under the GPL, see http://research.colberg.org/projects/halm
Fast decimal floating-point division
A new implementation for decimal floating-point (DFP) division is introduced. The algorithm is based on high-radix SRT division The SRT division algorithm is named after D. Sweeney, J. E. Robertson, and T. D. Tocher. with the recurrence in a new decimal signed-digit format. Quotient digits are selected using comparison multiples, where the magnitude of the quotient digit is calculated by comparing the truncated partial remainder with limited precision multiples of the divisor. The sign is determined concurrently by investigating the polarity of the truncated partial remainder. A timing evaluation using a logic synthesis shows a significant decrease in the division execution time in contrast with one of the fastest DFP dividers reported in the open literatureHooman Nikmehr, Braden Phillips and Cheng-Chew Li
Recommended from our members
Complex block floating-point format with box encoding in communication systems
This research project entails an efficient numeric digital representation in communication systems design. A complex block floating-point format with box encoding is proposed to encode an array of complex numbers that has better numeric resolution than its IEEE-754 counterpart when the same number of bits are allocated to the dominant value in the array. It is estimated that at least 10% of bit savings could be achieved by the new complex block representation on a quad-precision IEEE-754 format. A further bits savings of up to 18% could potentially be achieved for complex blocks at half-precision and single-precision IEEE-754 representation. The implementation cost of the proposed block floating-point format is evaluated in terms of memory usage, design of arithmetic units, and memory input/output rates for communications system modeling and block diagrams. Further analysis is performed on the limitation and quantization effects of this complex block format relative to complex IEEE-754 format. The coverage of the arithmetic units design include complex block adder and complex block multiplier. The appropriate systems that would be required to perform algorithms such as the fast Fourier transform (forward and inverse) are designed using the proposed complex block format in multi-stages complex block multiply-adder. The proposed block floating-point format is simulated as a new numeric class defined and implemented in MATLAB simulation environment. The MATLAB simulation is divided into two major parts. The first part of MATLAB simulation targets the simulation of complex block addition and complex block multiplication units for arbitrary size of complex samples per input block. The reference output values of complex block arithmetic are those computed with similar precision in IEEE-754 format. The second part of MATLAB simulation is performed on the system model of the single-carrier modulation-based and multi-carrier modulation-based communication systems. The quadrature amplitude modulation (QAM) is the baseband modulation type targeted in this work. The specification identified in the system model is relevant to those specified in the Long-Term Evolution (LTE) Standards for Base Station, Release 12.Electrical and Computer Engineerin
High sample-rate Givens rotations for recursive least squares
The design of an application-specific integrated circuit of a parallel array processor is considered
for recursive least squares by QR decomposition using Givens rotations, applicable
in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation,
which, for this recursive algorithm, means that the time to perform arithmetic operations
is critical. The algorithm, architecture and arithmetic are considered in a single
integrated design procedure to achieve optimum results.
A realisation approach using standard arithmetic operators, add, multiply and divide is
adopted. The design of high-throughput operators with low delay is addressed for fixed- and
floating-point number formats, and the application of redundant arithmetic considered. New
redundant multiplier architectures are presented enabling reductions in area of up to 25%,
whilst maintaining low delay. A technique is presented enabling the use of a conventional
tree multiplier in recursive applications, allowing savings in area and delay. Two new divider
architectures are presented showing benefits compared with the radix-2 modified SRT algorithm.
Givens rotation algorithms are examined to determine their suitability for VLSI implementation.
A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed
enabling the sample-rate to be increased by a factor of approximately 6 and offering
area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of
136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology.
The enhanced SGR algorithm has been compared with a CORDIC approach and shown to
benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation
on a parallel array of general purpose (GP) DSP chips, it is estimated that a single
application specific chip could offer up to 1,500 times the computation obtained from a
single OP DSP chip
Techniques for the realization of ultrareliable spaceborne computers Interim scientific report
Error-free ultrareliable spaceborne computer
- …