Search CORE

884 research outputs found

Paranoia.Ada: A diagnostic program to evaluate Ada floating-point arithmetic

Author: Hjermstad Chris
Publication venue
Publication date
Field of study

Many essential software functions in the mission critical computer resource application domain depend on floating point arithmetic. Numerically intensive functions associated with the Space Station project, such as emphemeris generation or the implementation of Kalman filters, are likely to employ the floating point facilities of Ada. Paranoia.Ada appears to be a valuabe program to insure that Ada environments and their underlying hardware exhibit the precision and correctness required to satisfy mission computational requirements. As a diagnostic tool, Paranoia.Ada reveals many essential characteristics of an Ada floating point implementation. Equipped with such knowledge, programmers need not tremble before the complex task of floating point computation

NASA Technical Reports Server

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

Author: Anderson
Block
Dekker
Felix Höfling
Flenner
Frenkel
Götze
Hansen
Harvey
Knuth
Knuth
Kob
Kob
Kob
Lippert
Liu
Mosayebi
Peter H. Colberg
Plimpton
Preis
Rapaport
Sagan
Stone
van Meel
Voelz
Weeks
Xu
Yang
Zagha
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms of energy and momentum conservation, and caveats on limited floating-point precision are issued. Strict energy conservation over 10^8 MD steps is obtained by double-single emulation of the floating-point arithmetic in accuracy-critical parts of the algorithm. For the slow dynamics of a supercooled binary Lennard-Jones mixture, we demonstrate that the use of single-floating point precision may result in quantitatively and even physically wrong results. For simulations of a Lennard-Jones fluid, the described implementation shows speedup factors of up to 80 compared to a serial implementation for the CPU, and a single GPU was found to compare with a parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package licensed under the GPL, see http://research.colberg.org/projects/halm

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Division and square root for mobile and scientific computing markets

Author: Holimath Vijaykumar
Publication venue: Universidade de Santiago de Compostela. Servizo de Publicacións e Intercambio Científico
Publication date: 01/01/2007
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional da Universidade de Santiago de Compostela

Fast decimal floating-point division

Author: Lim C.
Nikmehr H.
Phillips B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

A new implementation for decimal floating-point (DFP) division is introduced. The algorithm is based on high-radix SRT division The SRT division algorithm is named after D. Sweeney, J. E. Robertson, and T. D. Tocher. with the recurrence in a new decimal signed-digit format. Quotient digits are selected using comparison multiples, where the magnitude of the quotient digit is calculated by comparing the truncated partial remainder with limited precision multiples of the divisor. The sign is determined concurrently by investigating the polarity of the truncated partial remainder. A timing evaluation using a logic synthesis shows a significant decrease in the division execution time in contrast with one of the fastest DFP dividers reported in the open literatureHooman Nikmehr, Braden Phillips and Cheng-Chew Li

Crossref

Adelaide Research & Scholarship

THE EUCLIDEAN DEFINITION OF THE FUNCTIONS DIV AND MOD

Author: Boute Raymond
Publication venue: ASSOC COMPUTING MACHINERY
Publication date: 01/01/1992
Field of study

Ghent University Academic Bibliography

Recommended from our members

Complex block floating-point format with box encoding in communication systems

Author: Choo Yeong Foong
Publication venue
Publication date: 07/08/2018
Field of study

This research project entails an efficient numeric digital representation in communication systems design. A complex block floating-point format with box encoding is proposed to encode an array of complex numbers that has better numeric resolution than its IEEE-754 counterpart when the same number of bits are allocated to the dominant value in the array. It is estimated that at least 10% of bit savings could be achieved by the new complex block representation on a quad-precision IEEE-754 format. A further bits savings of up to 18% could potentially be achieved for complex blocks at half-precision and single-precision IEEE-754 representation. The implementation cost of the proposed block floating-point format is evaluated in terms of memory usage, design of arithmetic units, and memory input/output rates for communications system modeling and block diagrams. Further analysis is performed on the limitation and quantization effects of this complex block format relative to complex IEEE-754 format. The coverage of the arithmetic units design include complex block adder and complex block multiplier. The appropriate systems that would be required to perform algorithms such as the fast Fourier transform (forward and inverse) are designed using the proposed complex block format in multi-stages complex block multiply-adder. The proposed block floating-point format is simulated as a new numeric class defined and implemented in MATLAB simulation environment. The MATLAB simulation is divided into two major parts. The first part of MATLAB simulation targets the simulation of complex block addition and complex block multiplication units for arbitrary size of complex samples per input block. The reference output values of complex block arithmetic are those computed with similar precision in IEEE-754 format. The second part of MATLAB simulation is performed on the system model of the single-carrier modulation-based and multi-carrier modulation-based communication systems. The quadrature amplitude modulation (QAM) is the baseband modulation type targeted in this work. The specification identified in the system model is relevant to those specified in the Long-Term Evolution (LTE) Standards for Base Station, Release 12.Electrical and Computer Engineerin

Texas ScholarWorks

High sample-rate Givens rotations for recursive least squares

Author: Walke Richard Lewis
Publication venue
Publication date
Field of study

The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

Warwick Research Archives Portal Repository

Techniques for the realization of ultrareliable spaceborne computers Interim scientific report

Author: Goldberg J.
Green M. W.
Levitt K. N.
Stone H. S.
Publication venue
Publication date
Field of study

Error-free ultrareliable spaceborne computer

NASA Technical Reports Server