8,167 research outputs found
Operand Folding Hardware Multipliers
This paper describes a new accumulate-and-add multiplication algorithm. The
method partitions one of the operands and re-combines the results of
computations done with each of the partitions. The resulting design turns-out
to be both compact and fast.
When the operands' bit-length is 1024, the new algorithm requires only
additions (on average), this is about half the number of additions
required by the classical accumulate-and-add multiplication algorithm
()
Secure and Efficient RNS Approach for Elliptic Curve Cryptography
Scalar multiplication, the main operation in elliptic
curve cryptographic protocols, is vulnerable to side-channel
(SCA) and fault injection (FA) attacks. An efficient countermeasure
for scalar multiplication can be provided by using alternative
number systems like the Residue Number System (RNS). In RNS,
a number is represented as a set of smaller numbers, where each
one is the result of the modular reduction with a given moduli
basis. Under certain requirements, a number can be uniquely
transformed from the integers to the RNS domain (and vice
versa) and all arithmetic operations can be performed in RNS.
This representation provides an inherent SCA and FA resistance
to many attacks and can be further enhanced by RNS arithmetic
manipulation or more traditional algorithmic countermeasures.
In this paper, extending our previous work, we explore the
potentials of RNS as an SCA and FA countermeasure and provide
an description of RNS based SCA and FA resistance means. We
propose a secure and efficient Montgomery Power Ladder based
scalar multiplication algorithm on RNS and discuss its SCAFA
resistance. The proposed algorithm is implemented on an
ARM Cortex A7 processor and its SCA-FA resistance is evaluated
by collecting preliminary leakage trace results that validate our
initial assumptions
Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations
Although double-precision floating-point arithmetic currently dominates
high-performance computing, there is increasing interest in smaller and simpler
arithmetic types. The main reasons are potential improvements in energy
efficiency and memory footprint and bandwidth. However, simply switching to
lower-precision types typically results in increased numerical errors. We
investigate approaches to improving the accuracy of reduced-precision
fixed-point arithmetic types, using examples in an important domain for
numerical computation in neuroscience: the solution of Ordinary Differential
Equations (ODEs). The Izhikevich neuron model is used to demonstrate that
rounding has an important role in producing accurate spike timings from
explicit ODE solution algorithms. In particular, fixed-point arithmetic with
stochastic rounding consistently results in smaller errors compared to single
precision floating-point and fixed-point arithmetic with round-to-nearest
across a range of neuron behaviours and ODE solvers. A computationally much
cheaper alternative is also investigated, inspired by the concept of dither
that is a widely understood mechanism for providing resolution below the least
significant bit (LSB) in digital signal processing. These results will have
implications for the solution of ODEs in other subject areas, and should also
be directly relevant to the huge range of practical problems that are
represented by Partial Differential Equations (PDEs).Comment: Submitted to Philosophical Transactions of the Royal Society
Parallel Integer Polynomial Multiplication
We propose a new algorithm for multiplying dense polynomials with integer
coefficients in a parallel fashion, targeting multi-core processor
architectures. Complexity estimates and experimental comparisons demonstrate
the advantages of this new approach
Complexity Analysis of Reed-Solomon Decoding over GF(2^m) Without Using Syndromes
For the majority of the applications of Reed-Solomon (RS) codes, hard
decision decoding is based on syndromes. Recently, there has been renewed
interest in decoding RS codes without using syndromes. In this paper, we
investigate the complexity of syndromeless decoding for RS codes, and compare
it to that of syndrome-based decoding. Aiming to provide guidelines to
practical applications, our complexity analysis differs in several aspects from
existing asymptotic complexity analysis, which is typically based on
multiplicative fast Fourier transform (FFT) techniques and is usually in big O
notation. First, we focus on RS codes over characteristic-2 fields, over which
some multiplicative FFT techniques are not applicable. Secondly, due to
moderate block lengths of RS codes in practice, our analysis is complete since
all terms in the complexities are accounted for. Finally, in addition to fast
implementation using additive FFT techniques, we also consider direct
implementation, which is still relevant for RS codes with moderate lengths.
Comparing the complexities of both syndromeless and syndrome-based decoding
algorithms based on direct and fast implementations, we show that syndromeless
decoding algorithms have higher complexities than syndrome-based ones for high
rate RS codes regardless of the implementation. Both errors-only and
errors-and-erasures decoding are considered in this paper. We also derive
tighter bounds on the complexities of fast polynomial multiplications based on
Cantor's approach and the fast extended Euclidean algorithm.Comment: 11 pages, submitted to EURASIP Journal on Wireless Communications and
Networkin
Efficient implementation of the Hardy-Ramanujan-Rademacher formula
We describe how the Hardy-Ramanujan-Rademacher formula can be implemented to
allow the partition function to be computed with softly optimal
complexity and very little overhead. A new implementation
based on these techniques achieves speedups in excess of a factor 500 over
previously published software and has been used by the author to calculate
, an exponent twice as large as in previously reported
computations.
We also investigate performance for multi-evaluation of , where our
implementation of the Hardy-Ramanujan-Rademacher formula becomes superior to
power series methods on far denser sets of indices than previous
implementations. As an application, we determine over 22 billion new
congruences for the partition function, extending Weaver's tabulation of 76,065
congruences.Comment: updated version containing an unconditional complexity proof;
accepted for publication in LMS Journal of Computation and Mathematic
- …