7 research outputs found

    Low-latency Hardware Architecture for VDF Evaluation in Class Groups

    Get PDF
    The verifiable delay function (VDF), as a kind of cryptographic primitives, has recently been adopted quite often in decentralized systems. Highly correlated to the security of VDFs, the fastest implementation for VDF evaluation is generally desired to be publicly known. In this paper, for the first time, we propose a low-latency hardware implementation for the complete VDF evaluation in the class group by joint exploiting optimizations. On one side, we reduce the required computational cycles by decreasing the hardware-unfriendly divisions and increase the parallelism of computations by reducing the data dependency. On the other side, well-optimized low-latency architectures for large-number divisions, multiplications, and additions are developed, respectively, while those operations are generally very hard to be accelerated. Based on these basic operators, we devise the architecture for the complete VDF evaluation with possibly minimal pipeline stalls. Finally, the proposed design is coded and synthesized under the TSMC 28-nm CMOS technology. The experimental results show that our design can achieve a speedup of 3.6x compared to the optimal C++ implementation for the VDF evaluation over an advanced CPU. Moreover, compared to the state-of-the-art hardware implementation for the squaring, a key step of VDF, we achieve about 2x speedup

    Low-Latency Design and Implementation of the Squaring in Class Groups for Verifiable Delay Function Using Redundant Representation

    Get PDF
    A verifiable delay function (VDF) is a function whose evaluation requires running a prescribed number of sequential steps over a group while the result can be efficiently verified. As a kind of cryptographic primitives, VDFs have been adopted in rapidly growing applications for decentralized systems. For the security of VDFs in practical applications, it is widely agreed that the fastest implementation for the VDF evaluation, sequential squarings in a group of unknown order, should be publicly provided. To this end, we propose a possible minimum latency hardware implementation for the squaring in class groups by algorithmic and architectural level co-optimization. Firstly, low-latency architectures for large-number division, multiplication, and addition are devised using redundant representation, respectively. Secondly, we present two hardware-friendly algorithms which avoid time-consuming divisions involved in calculations related to the extended greatest common divisor (XGCD) and design the corresponding low-latency architectures. Besides, we schedule and reuse these computation modules to achieve good resource utilization by using compact instruction control. Finally, we code and synthesize the proposed design under the TSMC 28nm CMOS technology. The experimental results show that our design can achieve a speedup of 3.6x compared to the state-of-the-art implementation of the squaring in the class group. Moreover, compared to the optimal C++ implementation over an advanced CPU, our implementation is 9.1x faster

    HIGH PERFORMANCE HARDWARE FOR MODULAR DIVISION/INVERSE

    Get PDF

    HIGH PERFORMANCE HARDWARE FOR MODULAR DIVISION/INVERSE

    Get PDF

    Fast Modular Reduction for Large-Integer Multiplication

    Get PDF
    The work contained in this thesis is a representation of the successful attempt to speed-up the modular reduction as an independent step of modular multiplication, which is the central operation in public-key cryptosystems. Based on the properties of Mersenne and Quasi-Mersenne primes, four distinct sets of moduli have been described, which are responsible for converting the single-precision multiplication prevalent in many of today\u27s techniques into an addition operation and a few simple shift operations. A novel algorithm has been proposed for modular folding. With the backing of the special moduli sets, the proposed algorithm is shown to outperform (speed-wise) the Modified Barrett algorithm by 80% for operands of length 700 bits, the least speed-up being around 70% for smaller operands, in the range of around 100 bits

    Introducción a la Teoría de Números

    Get PDF
    La Teoría de Números estudia los números enteros y, en cierta medida los números racionales y los números algebraicos. La Teoría Computacional de Números (Computational Number Theory) es sinónimo de Teoría Algorítmica de Números. Aquí se estudia los algoritmos eficientes para cálculos en teoría de números. Este es un libro introductorio orientado hacia la teoría algorítmica de números. El interés es mostrar el valor puramente teórico de algunos teoremas y cómo se debe hacer una variación si el propósito es cálculos rápidos y eficientes. Algunas algoritmos sencillos se implementan en VBA Excel o en LibreOffice Basic por ser lenguajes muy amigables y por ser las hojas electrónicas muy familiares para los estudiantes. Sin emabargo estas implementaciones son muy límitadas y solo tienen fines didácticos. Otras implementaciones se hacen en Java (para usar enteros y racionales grandes). En el capítulo final se desarrollan algunos programas en Java que sirven de base para implementar otros algoritmos.Instituto Tecnológico de Costa Rica. Escuela de Matemática
    corecore