Search CORE

56 research outputs found

Square-rich fixed point polynomial evaluation on FPGAs

Author: Fahmy Suhaib A.
McLoughlin Ian V.
Xu Simin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algorithm, which reforms the evaluation process to include an increased number of squaring steps. By using a squarer design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, while consuming less area than Horner's rule, when implemented on a Xilinx Virtex 6 FPGA. When applied in fixed point function evaluation, where precision requirements limit the rounding of operands, it still achieves a 52.4% performance gain compared to Horner's rule with only a 4% area overhead in evaluating 5th degree polynomials

Crossref

Kent Academic Repository

Recommended from our members

FPGA Implementations of Elliptic Curve Cryptography and Tate Pairing over Binary Field

Author: Huang Jian
Publication venue: 'University of North Texas Libraries'
Publication date: 01/08/2007
Field of study

Elliptic curve cryptography (ECC) is an alternative to traditional techniques for public key cryptography. It offers smaller key size without sacrificing security level. Tate pairing is a bilinear map used in identity based cryptography schemes. In a typical elliptic curve cryptosystem, elliptic curve point multiplication is the most computationally expensive component. Similarly, Tate pairing is also quite computationally expensive. Therefore, it is more attractive to implement the ECC and Tate pairing using hardware than using software. The bases of both ECC and Tate pairing are Galois field arithmetic units. In this thesis, I propose the FPGA implementations of the elliptic curve point multiplication in GF (2283) as well as Tate pairing computation on supersingular elliptic curve in GF (2283). I have designed and synthesized the elliptic curve point multiplication and Tate pairing module using Xilinx's FPGA, as well as synthesized all the Galois arithmetic units used in the designs. Experimental results demonstrate that the FPGA implementation can speedup the elliptic curve point multiplication by 31.6 times compared to software based implementation. The results also demonstrate that the FPGA implementation can speedup the Tate pairing computation by 152 times compared to software based implementation

UNT Digital Library

A Digital Integrated Inertial Navigation System For Aerial Vehicles

Author
Publication venue
Publication date
Field of study

A Digital Integrated Inertial Navigation System For Aerial Vehicles

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

Efficient algorithm and architecture for implementation of multiplier circuits in modern EPGAs

Author: Athow Jacques Laurent
Publication venue
Publication date: 01/01/2008
Field of study

High speed multiplication in Field Programmable Gate Arrays is often performed either using logic cells or with built-in DSP blocks. The latter provides the highest performance for arithmetic operations while being also optimized in terms of power and area utilization. Scalability of input operands is limited to that of a single DSP block and the current CAD tools provide little help when the designer needs to build larger arithmetic blocks. The present thesis proposes an effective approach to the problem of building large integer multipliers out of smaller ones by giving two algorithms to the system designer, for a given FPGA technology. Large word length is required in applications such as cryptography and video processing. The first proposed algorithm partitions large input multipliers into an architecture-aware design. The second algorithm then places the generated design in an optimal layout minimizing interconnect delay. The thesis concludes with simulation and hardware generated data to support the proposed algorithms

Concordia University Research Repository

Hardware Implementations of the WG-16 Stream Cipher with Composite Field Arithmetic

Author: Zidaric Nusa
Publication venue: 'University of Waterloo'
Publication date: 18/09/2014
Field of study

The WG stream cipher family consists of stream ciphers based on the Welch-Gong (WG) transformations that are used as a nonlinear filter applied to the output of a linear feedback shift register (LFSR). The aim of this thesis is an exploration of the design space of the WG-16 stream cipher. Five different representations of the field elements were analyzed, namely the polynomial basis representation, the normal basis representation and three isomorphic tower field constructions of F216: F(((22)2)2)2, F(24)4 and F(28)2. Each design option begins with an in-depth description of different field constructions and their impact on the top-level WG transformation circuit. Normal basis representation of elements for each level of the tower was chosen for field constructions F(((22)2)2)2 and F(24)4, and a mixed basis, with polynomial basis for the lower and normal basis for the higher level of the tower for F(28)2. Representation of field elements affects the field arithmetic, which in turn affects the entire design. Targeting high throughput, pipelined architectures were developed, and pipelining was based on the particular field construction: each extension over the prime field offers a new pipelining possibility. Pipelining at a lower level of the tower field reduces the clock period. Most flexible pipelining options are possible for F(((22)2)2)2, a highly regular construction, which permits an algebraic optimization of the WG transformation resulting in two multiplications being removed. High speed, achieved by adequate pipelining granularity, and smaller area due to removed multipliers deem the F(((22)2)2)2 to be the most suitable field construction for the implementation of WG-16. The best WG-16 modules achieve a throughput of 222 Mbit/s with 476 slices used on the Xilinx Spartan-6 FPGA device xc6slx9 (using Xilinx Synthesis Tool (XST) for synthesis and ISE for implementation [47]) and a throughput of 529 Mbit/s with area cost of 12215 GEs for ASIC implementation, using the 65 nm CMOS technology (using Synopsys Design Compiler for synthesis [45] and Cadence SoC Encounter to complete the Place-and-Route phase)

University of Waterloo's Institutional Repository

Harder, Better, Faster, Stronger - Elliptic Curve Discrete Logarithm Computations on FPGAs

Author: Erich Wenger
Paul Wolfger
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 17/08/2015
Field of study

Computing discrete logarithms takes time. It takes time to develop new algorithms, choose the best algorithms, implement these algorithms correctly and efficiently, keep the system running for several months, and, finally, publish the results. In this paper, we present a highly performant architecture that can be used to compute discrete logarithms of Weierstrass curves defined over binary fields and Koblitz curves using FPGAs. We used the architecture to compute for the first time a discrete logarithm of the elliptic curve \texttt{sect113r1}, a previously standardized binary curve, using 10 Kintex-7 FPGAs. To achieve this result, we investigated different iteration functions, used a negation map, dealt with the fruitless cycle problem, built an efficient FPGA design that processes 900 million iterations per second, and we tended for several months the optimized implementations running on the FPGAs

Cryptology ePrint Archive

Adaptive and hybrid schemes for efficient parallel squaring and cubing units

Author: Bui Son Viet
Publication venue
Publication date: 01/12/2014
Field of study

Squaring (X2) and cubing (X3) units are special operations of multiplication used in many applications, such as image compression, equalization, decoding and demodulation, 3D graphics, scientific computing, artificial neural networks, logarithmic number system, and multimedia application. They can also be an efficient way to compute other basic functions. Therefore, improving their performances is a goal for many researchers. This dissertation will discuss modification to algorithms to compute parallel squaring and cubing units in both signed and unsigned representation. After that, truncated technique is applied to improve their performance. Each unit is modeled and estimated to obtain its area, delay by using linear evaluation model. A C program was written to generate Hardware Description Language files for each unit. These units are simulated and verified in simulation. Moreover, area, delay, and power consumption are calculated for each unit and compared with those ones in previous approaches for both Virtex 5 Xilinx FPGA and IBM 65nm ASIC technologies

SHAREOK repository