2,610 research outputs found

    Методи та засоби підвищення ефективності реалізації обчислювальних операцій у скінченних полях

    Get PDF
    У дисертаційній роботі вирішено актуальну науково-прикладну задачу – підвищення продуктивності систем цифрової обробки даних та криптографічних перетворень, забезпечення завадостійкості зберігання і передачі даних за рахунок створення ефективних технічних засобів для виконання обчислень у скінченних полях шляхом структурно-логічної оптимізації архітектур апаратних засобів, що реалізують процеси виконання операцій у полях Галуа. Запропоновано метод виконання операцій над елементами поля GF(2m). Особливістю даного методу, на відміну від існуючих, є застосування табличного зберігання елементів поля у многочленному та степеневому їх поданні з можливістю розрідженого формування таблиці елементів поля, що зменшує витрати пам’яті для її зберігання. Розроблений метод забезпечує зростання швидкодії на 15% порівняно з існуючим методом. Запропоновано модифікацію методу піднесення до степеня елементів поля GF(p) з ковзним вікном, яка забезпечує приріст швидкодії на 7-9 %. Спроектовано на ПЛІС фірми Xilinx процесор Галуа, що орієнтований на виконання операцій у скінченних полях виду GF(p) та GF(2m). Запропоновано програмістську модель процесора Галуа, яка дозволяє розробляти програмне забезпечення довільної складності мовою Асемблера процесора Галуа.The thesis is devoted to the problem of increasing the efficiency of computations in finite fields. The proposed solution to this problem is to develop methods and means for performing operations on elements of finite fields GF(p) or GF(2m). The analysis of the current state of the development of methods of operations in finite fields is carried out and priority points are highlighted. It is best to classify them on the basis of the distinguished features. The classification of the methods of performing the most computational expensive operations (the calculation of the multiplicative inverse element and the exponentiation) in the finite fields was performed. It enables to conduct thorough research and form the directions of development for these methods. The method of high-speed implementation of additive and multiplicative operations on elements of GF(2m) and corresponding hardware structures for its implemen-tation are proposed. Additive operations include addition and subtraction. Multiplica-tive operations include multiplication, exponentiation, multiplicative inverse element calculation, and division. The research has shown that table storage of elements of the GF(2m) in their polynomial and power representation ensures the maximum speed and versatility of the arithmetic logic unit. With the use of long integer operands, the sparse formation of the table of field elements is proposed. It enables reducing the memory consumption for its storage in several times. The algorithms for converting power representa-tion into polynomial one and polynomial representation in power one with the use of a sparse table are constructed. The developed method provides a 15% increase in per-formance comparing with the existing method. Experimental researches were performed to determine the best sparse ratio of the elements table of the GF(2m). It has been discovered that for a value of m < 21, the best sparse ratio is equal to 8. For the exponentiation, it is desirable to increase the sparse ratio to 16. The algorithms for converting the power representation into polynomial one and polynomial representation into power one with the use of a sparse table are devel-oped. The modification of the method of exponentiation elements GF(p) with sliding window and the corresponding hardware structures for its implementation are pro-posed. The difference from the existing methods is that when forming a precomputation table, the exponents are prime numbers, and when analyzing the binary represen-tation of the exponent, blocks of bits forming a prime number are allocated. To achieve high performance, when constructing a precomputation table, it is recom-mended to use pre-calculated additive chains to minimize the number of multiplica-tion operations, which allows each of the following table items to be obtained in one or two modular multiplication operations. With the help of the developed computational model, it has been found out that the proposed modification of the exponentiation method of elements GF(p) with a sliding window provides an increase in speed by 7-9%. The modification of the ex-ponentiation method of elements GF(p) with a sliding window can be used in elliptic-curve cryptography to improve the time characteristics of the scalar multiplication on elliptic curve. The model of the computational process of execution of operations in finite fields is developed, which enables comparison of methods on the basis of given sets of input data and the execution of the optimal choice of parameters and forms of presentation of operands, which ensure an increase in speed for the implementation of computing operations on the FPGA. The research methodologies of new methods hardware implementation of calculations in Galois fields are developed on the basis of the proposed model. Simulation in the development environment of Xilinx ISE and using the Mentor Graphics Precision software showed that the developed hardware structures are characterized by minimal hardware complexity and high performance. A Verilog code generator for FPGA synthesis of a ROM element containing a sparse table of GF(2m) field elements in a polynomial and power representation is created which automatically generates a Verilog code for a given irreducible polyno-mial and a sparse ratio of the table. The architecture and the command system of the specialized Galois processor, focused on operations in finite fields, has been developed. The Galois processor can be used in a universal computing system as a coprocessor, complementing the com-mand system of the central processor unit, or as a special device based on the FPGA, which increases the efficiency of processing information in real time in comparison with the universal computing means. The architecture feature of the developed processor is that a user can change the arithmetic logic unit (ALU) to make the transition to the GF(p) or GF(2m) without changing the processor interface. The research of the developed Galois processor has been carried out, which showed that this processor provides an increase in the productivity of computing by 27% compared with the universal computing means. A program model of the Galois processor is constructed, which allows the user to create software of arbitrary complexity in Assembler of the Galois processor.В диссертационной работе решена актуальная научно-прикладная задача – повышение производительности систем цифровой обработки данных и крипто-графических преобразований, обеспечение помехоустойчивости хранения и передачи данных за счет создания эффективных технических средств для выполнения вычислений в конечных полях путем структурно-логической оптимизации архитектур аппаратных средств, реализующих процессы выполнения операций в полях Галуа. Предложен метод выполнения операций над элементами поля GF(2m). Особенностью данного метода, в отличие от существующих, является применение табличного хранения элементов поля в многочленном и степенном их представлении с возможностью разреженного формирования таблицы элементов поля, что уменьшает затраты памяти для ее хранения. Разработанный метод обеспечивает прирост производительности на 15% по сравнению с существующим методом. Предложена модификация метода возведения в степень элементов поля GF(p) со скользящим окном, обеспечивающая прирост быстродействия на 7-9%. Спроектирован на ПЛИС фирмы Xilinx процессор Галуа, ориентированный на выполнение операций в конечных полях вида GF(p) и GF(2m). Предложена программистская модель процессора Галуа, позволяющая разрабатывать программное обеспечение любой сложности на языке ассемблера процессора Галуа

    Complexity Analysis of Reed-Solomon Decoding over GF(2^m) Without Using Syndromes

    Get PDF
    For the majority of the applications of Reed-Solomon (RS) codes, hard decision decoding is based on syndromes. Recently, there has been renewed interest in decoding RS codes without using syndromes. In this paper, we investigate the complexity of syndromeless decoding for RS codes, and compare it to that of syndrome-based decoding. Aiming to provide guidelines to practical applications, our complexity analysis differs in several aspects from existing asymptotic complexity analysis, which is typically based on multiplicative fast Fourier transform (FFT) techniques and is usually in big O notation. First, we focus on RS codes over characteristic-2 fields, over which some multiplicative FFT techniques are not applicable. Secondly, due to moderate block lengths of RS codes in practice, our analysis is complete since all terms in the complexities are accounted for. Finally, in addition to fast implementation using additive FFT techniques, we also consider direct implementation, which is still relevant for RS codes with moderate lengths. Comparing the complexities of both syndromeless and syndrome-based decoding algorithms based on direct and fast implementations, we show that syndromeless decoding algorithms have higher complexities than syndrome-based ones for high rate RS codes regardless of the implementation. Both errors-only and errors-and-erasures decoding are considered in this paper. We also derive tighter bounds on the complexities of fast polynomial multiplications based on Cantor's approach and the fast extended Euclidean algorithm.Comment: 11 pages, submitted to EURASIP Journal on Wireless Communications and Networkin

    Efficient Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT Coordinators

    Get PDF
    Security management for IoT applications is a critical research field, especially when taking into account the performance variation over the very different IoT devices. In this paper, we present high-performance client/server coordinators on low-cost SoC-FPGA devices for secure IoT data collection. Security is ensured by using the Transport Layer Security (TLS) protocol based on the TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 cipher suite. The hardware architecture of the proposed coordinators is based on SW/HW co-design, implementing within the hardware accelerator core Elliptic Curve Scalar Multiplication (ECSM), which is the core operation of Elliptic Curve Cryptosystems (ECC). Meanwhile, the control of the overall TLS scheme is performed in software by an ARM Cortex-A9 microprocessor. In fact, the implementation of the ECC accelerator core around an ARM microprocessor allows not only the improvement of ECSM execution but also the performance enhancement of the overall cryptosystem. The integration of the ARM processor enables to exploit the possibility of embedded Linux features for high system flexibility. As a result, the proposed ECC accelerator requires limited area, with only 3395 LUTs on the Zynq device used to perform high-speed, 233-bit ECSMs in 413 µs, with a 50 MHz clock. Moreover, the generation of a 384-bit TLS handshake secret key between client and server coordinators requires 67.5 ms on a low cost Zynq 7Z007S device

    A versatile Montgomery multiplier architecture with characteristic three support

    Get PDF
    We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%

    Arithmetic Operations in Multi-Valued Logic

    Full text link
    This paper presents arithmetic operations like addition, subtraction and multiplications in Modulo-4 arithmetic, and also addition, multiplication in Galois field, using multi-valued logic (MVL). Quaternary to binary and binary to quaternary converters are designed using down literal circuits. Negation in modular arithmetic is designed with only one gate. Logic design of each operation is achieved by reducing the terms using Karnaugh diagrams, keeping minimum number of gates and depth of net in to consideration. Quaternary multiplier circuit is proposed to achieve required optimization. Simulation result of each operation is shown separately using Hspice.Comment: 12 Pages, VLSICS Journal 201

    Analysis of Parallel Montgomery Multiplication in CUDA

    Get PDF
    For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

    Efficient unified Montgomery inversion with multibit shifting

    Get PDF
    Computation of multiplicative inverses in finite fields GF(p) and GF(2/sup n/) is the most time-consuming operation in elliptic curve cryptography, especially when affine co-ordinates are used. Since the existing algorithms based on the extended Euclidean algorithm do not permit a fast software implementation, projective co-ordinates, which eliminate almost all of the inversion operations from the curve arithmetic, are preferred. In the paper, the authors demonstrate that affine co-ordinate implementation provides a comparable speed to that of projective co-ordinates with careful hardware realisation of existing algorithms for calculating inverses in both fields without utilising special moduli or irreducible polynomials. They present two inversion algorithms for binary extension and prime fields, which are slightly modified versions of the Montgomery inversion algorithm. The similarity of the two algorithms allows the design of a single unified hardware architecture that performs the computation of inversion in both fields. They also propose a hardware structure where the field elements are represented using a multi-word format. This feature allows a scalable architecture able to operate in a broad range of precision, which has certain advantages in cryptographic applications. In addition, they include statistical comparison of four inversion algorithms in order to help choose the best one amongst them for implementation onto hardware
    corecore