Search CORE

59 research outputs found

Complexity Analysis of Reed-Solomon Decoding over GF(2^m) Without Using Syndromes

Author: Chen Ning
Yan Zhiyuan
Publication venue
Publication date: 01/01/2008
Field of study

For the majority of the applications of Reed-Solomon (RS) codes, hard decision decoding is based on syndromes. Recently, there has been renewed interest in decoding RS codes without using syndromes. In this paper, we investigate the complexity of syndromeless decoding for RS codes, and compare it to that of syndrome-based decoding. Aiming to provide guidelines to practical applications, our complexity analysis differs in several aspects from existing asymptotic complexity analysis, which is typically based on multiplicative fast Fourier transform (FFT) techniques and is usually in big O notation. First, we focus on RS codes over characteristic-2 fields, over which some multiplicative FFT techniques are not applicable. Secondly, due to moderate block lengths of RS codes in practice, our analysis is complete since all terms in the complexities are accounted for. Finally, in addition to fast implementation using additive FFT techniques, we also consider direct implementation, which is still relevant for RS codes with moderate lengths. Comparing the complexities of both syndromeless and syndrome-based decoding algorithms based on direct and fast implementations, we show that syndromeless decoding algorithms have higher complexities than syndrome-based ones for high rate RS codes regardless of the implementation. Both errors-only and errors-and-erasures decoding are considered in this paper. We also derive tighter bounds on the complexities of fast polynomial multiplications based on Cantor's approach and the fast extended Euclidean algorithm.Comment: 11 pages, submitted to EURASIP Journal on Wireless Communications and Networkin

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

Directory of Open Access Journals

Speeding up a scalable modular inversion hardware architecture

Author: Gutub AA-A
Kalganova T
Publication venue: KFUPM
Publication date: 01/01/2005
Field of study

The modular inversion is a fundamental process in several cryptographic systems. It can be computed in software or hardware, but hardware computation proven to be faster and more secure. This research focused on improving an old scalable inversion hardware architecture proposed in 2004 for finite field GF(p). The architecture has been made of two parts, a computing unit and a memory unit. The memory unit is to hold all the data bits of computation whereas the computing unit performs all the arithmetic operations in word (digit) by word bases known as scalable method. The main objective of this project was to investigate the cost and benefit of modifying the memory unit to include parallel shifting, which was one of the tasks of the scalable computing unit. The study included remodeling the entire hardware architecture removing the shifter from the scalable computing part embedding it in the memory unit instead. This modification resulted in a speedup to the complete inversion process with an area increase due to the new memory shifting unit. Quantitative measurements of the speed area trade-off have been investigated. The results showed that the extra hardware to be added for this modification compared to the speedup gained, giving the user the complete picture to choose from depending on the application need.the British council in Saudi Arabia, KFUPM, Dr. Tatiana Kalganova at the Electrical & Computer Engineering Department of Brunel University in Uxbridg

CiteSeerX

Brunel University Research Archive

New Hardware Algorithms and Designs for Montgomery Modular Inverse Computation in Galois Fields GF(p) and GF(2n)

Author: Gutub Adnan
Publication venue
Publication date: 11/06/2002
Field of study

The computation of the inverse of a number in finite fields, namely Galois Fields GF(p) or GF(2n), is one of the most complex arithmetic operations in cryptographic applications. In this work, we investigate the GF(p) inversion and present several phases in the design of efficient hardware implementations to compute the Montgomery modular inverse. We suggest a new correction phase for a previously proposed almost Montgomery inverse algorithm to calculate the inversion in hardware. It is also presented how to obtain a fast hardware algorithm to compute the inverse by multi-bit shifting method. The proposed designs have the hardware scalability feature, which means that the design can fit on constrained areas and still handle operands of any size. In order to have long-precision calculations, the module works on small precision words. The word-size, on which the module operates, can be selected based on the area and performance requirements. The upper limit on the operand precision is dictated only by the available memory to store the operands and internal results. The scalable module is in principle capable of performing infinite-precision Montgomery inverse computation of an integer, modulo a prime number. We also propose a scalable and unified architecture for a Montgomery inverse hardware that operates in both GF(p) and GF(2n) fields. We adjust and modify a GF(2n) Montgomery inverse algorithm to benefit from multi-bit shifting hardware features making it very similar to the proposed best design of GF(p) inversion hardware. We compare all scalable designs with fully parallel ones based on the same basic inversion algorithm. All scalable designs consumed less area and in general showed better performance than the fully parallel ones, which makes the scalable design a very efficient solution for computing the long precision Montgomery inverse

KFUPM ePrints

New Hardware Algorithms and Designs for Montgomery Modular Inverse Computation in Galois Fields GF(p) and GF(2n)

Author: Gutub Adnan
Publication venue
Publication date: 11/06/2002
Field of study

High Speed Hardware Architecture to Compute GF(p) Montgomery Inversion with Scalability Features

Author: Gutub Adnan
Publication venue
Publication date: 01/07/2007
Field of study

Modular inversion is a fundamental process in several cryptographic systems. It can be computed in software or hardware, but hardware computation has been proven to be faster and more secure. This research focused on improving an old scalable inversion hardware architecture proposed in 2004 for finite field GF(p). The architecture comprises two parts, a computing unit and a memory unit. The memory unit holds all the data bits of computation whereas the computing unit performs all the arithmetic operations in word (digit) by word bases such that the design is scalable. The main objective of this paper is to show the cost and benefit of modifying the memory unit to include shifting, which was previously one of the tasks of the scalable computing unit. The study included remodeling the entire hardware architecture removing the shifter from the scalable computing part and embedding it in the non-scalable memory unit instead. This modification resulted in a speedup to the complete inversion process with an area increase due to the new memory shifting unit. Several design schemes have been compared giving the user the complete picture to choose from depending on the application need

KFUPM ePrints

New Hardware Algorithms and Designs for Montgomery Modular Inverse Computation in Galois Fields GF(p) and GF(2n)

Author: Gutub Adnan
Publication venue
Publication date: 01/01/2002
Field of study

CiteSeerX

KFUPM ePrints

Efficient Scalable VLSI Architecture for Montgomery Inversion in GF(p)

Author: Gutub Adnan
Tenca Alexandre
Publication venue: 'Elsevier BV'
Publication date: 01/05/2004
Field of study

The multiplicative inversion operation is a fundamental computation in several cryptographic applications. In this work, we propose a scalable VLSI hardware to compute the Montgomery modular inverse in GF(p). We suggest a new correction phase for a previously proposed almost Montgomery inverse algorithm to calculate the inversion in hardware. We also propose an efficient hardware algorithm to compute the inverse by multi-bit shifting method. The intended VLSI hardware is scalable, which means that a fixed-area module can handle operands of any size. The word-size, which the module operates, can be selected based on the area and performance requirements. The upper limit on the operand precision is dictated only by the available memory to store the operands and internal results. The scalable module is in principle capable of performing infinite-precision Montgomery inverse computation of an integer, modulo a prime number. This scalable hardware is compared with a previously proposed fixed (fully parallel) design showing very attractive results

KFUPM ePrints

A Flexible BCH decoder for Flash Memory Systems using Cascaded BCH codes

Author: Subbiah Arul K.
Publication venue: Scholar Commons
Publication date: 04/06/2019
Field of study

NAND ash memories are widely used in consumer electronics, such as tablets, personal computers, smartphones, and gaming systems. However, unlike other standard storage devices, these ash memories suffer from various random errors. In order to address these reliability issues, various error correction codes (ECC) are employed. Bose-Chaudhuri Hocquenghem (BCH) code is the most common ECC used to address the errors in modern ash memories. Because of the limitation of the realization of the BCH codes for more extensive error correction, the modern ash memory devices use Low-density parity-check (LDPC) codes for error correction scheme. The realization of the LDPC decoders have greater complexity than BCH decoders, so these ECC decoders are implemented within the ash memory device. This thesis analyzes the limitation imposed by the state of the art implementation of BCH decoders and proposes a cascaded BCH code to address these limitations. In order to support a variety of ash memory devices, there are three main challenges to be addressed for BCH decoders. First, the latency of the BCH decoders, in the case of no error scenario, should be less than 100us. Second, there should be flexibility in supporting different ECC block size; more precisely, the solution should be able to support 256, 512, 1024, and 2048 bytes of ECC block. Third, there should be flexibility in supporting different bit errors. A recent development with Graphical Processing Units (GPUs) has attracted many researchers to use GPUs for non-graphical implementation. These GPUs are used in many consumer electronics as part of the system on chip (SOC) configuration. In this thesis we studied the limitation imposed by different implementations (VLSI, GPU, and CPU) of BCH decoders, and we propose a cascaded BCH code implemented using a hybrid approach to overcome the limitations of the BCH codes. By splitting the implementation across VLSI and GPUs, we have shown in this thesis that this method can provide flexibility over the block size and the bit error to be corrected

Scholar Commons - Santa Clara University

HIGH PERFORMANCE HARDWARE FOR MODULAR DIVISION/INVERSE

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

Scalable VLSI Architecture for GF(p) Montgomery Modular Inverse Computation

Author: Gutub Adnan
Koc C.
Tenca Alexandre
Publication venue
Publication date: 26/04/2002
Field of study

Modular inverse computation is needed in several public key cryptographic applications. In this work, we present two VLSI hardware implementations used in the calculation of Montgomery modular inverse operation. The implementations are based on the same inversion algorithm, however, one is fixed (fully parallel) and the other is scalable. The scalable design is the novel modification performed on the fixed hardware to make it occupy a small area and operate within better or similar speed. Both hardware designs are compared based on their speed and area. The area of the scalable design is on average 42% smaller than the fixed one. The delay of the designs, however, depends on the actual data size and the maximum numbers the hardware can handle. As the actual data size approach the hardware limit the scalable hardware speedup reduces in comparison to the fixed one, but still its delay is practical