2,617 research outputs found
Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath
Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This
conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We
show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra
functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area
Radix Conversion for IEEE754-2008 Mixed Radix Floating-Point Arithmetic
Conversion between binary and decimal floating-point representations is
ubiquitous. Floating-point radix conversion means converting both the exponent
and the mantissa. We develop an atomic operation for FP radix conversion with
simple straight-line algorithm, suitable for hardware design. Exponent
conversion is performed with a small multiplication and a lookup table. It
yields the correct result without error. Mantissa conversion uses a few
multiplications and a small lookup table that is shared amongst all types of
conversions. The accuracy changes by adjusting the computing precision
Generic design of Chinese remaindering schemes
We propose a generic design for Chinese remainder algorithms. A Chinese
remainder computation consists in reconstructing an integer value from its
residues modulo non coprime integers. We also propose an efficient linear data
structure, a radix ladder, for the intermediate storage and computations. Our
design is structured into three main modules: a black box residue computation
in charge of computing each residue; a Chinese remaindering controller in
charge of launching the computation and of the termination decision; an integer
builder in charge of the reconstruction computation. We then show that this
design enables many different forms of Chinese remaindering (e.g.
deterministic, early terminated, distributed, etc.), easy comparisons between
these forms and e.g. user-transparent parallelism at different parallel grains
Fast Fourier Transform algorithm design and tradeoffs
The Fast Fourier Transform (FFT) is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional Single Instruction Stream/Multiple Data (SIMD) parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Fast Fourier Transform programs are compared to the currently best Cray-2 FFT program
Generating and Searching Families of FFT Algorithms
A fundamental question of longstanding theoretical interest is to prove the
lowest exact count of real additions and multiplications required to compute a
power-of-two discrete Fourier transform (DFT). For 35 years the split-radix
algorithm held the record by requiring just 4n log n - 6n + 8 arithmetic
operations on real numbers for a size-n DFT, and was widely believed to be the
best possible. Recent work by Van Buskirk et al. demonstrated improvements to
the split-radix operation count by using multiplier coefficients or "twiddle
factors" that are not n-th roots of unity for a size-n DFT. This paper presents
a Boolean Satisfiability-based proof of the lowest operation count for certain
classes of DFT algorithms. First, we present a novel way to choose new yet
valid twiddle factors for the nodes in flowgraphs generated by common
power-of-two fast Fourier transform algorithms, FFTs. With this new technique,
we can generate a large family of FFTs realizable by a fixed flowgraph. This
solution space of FFTs is cast as a Boolean Satisfiability problem, and a
modern Satisfiability Modulo Theory solver is applied to search for FFTs
requiring the fewest arithmetic operations. Surprisingly, we find that there
are FFTs requiring fewer operations than the split-radix even when all twiddle
factors are n-th roots of unity.Comment: Preprint submitted on March 28, 2011, to the Journal on
Satisfiability, Boolean Modeling and Computatio
- …