1,926 research outputs found
Complexity Analysis of Reed-Solomon Decoding over GF(2^m) Without Using Syndromes
For the majority of the applications of Reed-Solomon (RS) codes, hard
decision decoding is based on syndromes. Recently, there has been renewed
interest in decoding RS codes without using syndromes. In this paper, we
investigate the complexity of syndromeless decoding for RS codes, and compare
it to that of syndrome-based decoding. Aiming to provide guidelines to
practical applications, our complexity analysis differs in several aspects from
existing asymptotic complexity analysis, which is typically based on
multiplicative fast Fourier transform (FFT) techniques and is usually in big O
notation. First, we focus on RS codes over characteristic-2 fields, over which
some multiplicative FFT techniques are not applicable. Secondly, due to
moderate block lengths of RS codes in practice, our analysis is complete since
all terms in the complexities are accounted for. Finally, in addition to fast
implementation using additive FFT techniques, we also consider direct
implementation, which is still relevant for RS codes with moderate lengths.
Comparing the complexities of both syndromeless and syndrome-based decoding
algorithms based on direct and fast implementations, we show that syndromeless
decoding algorithms have higher complexities than syndrome-based ones for high
rate RS codes regardless of the implementation. Both errors-only and
errors-and-erasures decoding are considered in this paper. We also derive
tighter bounds on the complexities of fast polynomial multiplications based on
Cantor's approach and the fast extended Euclidean algorithm.Comment: 11 pages, submitted to EURASIP Journal on Wireless Communications and
Networkin
Hardware/Software Co-design Applied to Reed-Solomon Decoding for the DMB Standard
This paper addresses the implementation of Reed-
Solomon decoding for battery-powered wireless
devices. The scope of this paper is constrained by the
Digital Media Broadcasting (DMB). The most critical
element of the Reed-Solomon algorithm is implemented
on two different reconfigurable hardware
architectures: an FPGA and a coarse-grained
architecture: the Montium, The remaining parts are
executed on an ARM processor. The results of this
research show that a co-design of the ARM together
with an FPGA or a Montium leads to a substantial
decrease in energy consumption. The energy
consumption of syndrome calculation of the Reed-
Solomon decoding algorithm is estimated for an FPGA
and a Montium by means of simulations. The Montium
proves to be more efficient
Towards a triple mode common operator FFT for Software Radio systems
International audienceA scenario to design a Triple Mode FFT is addressed. Based on a Dual Mode FFT structure, we present a methodology to reach a triple mode FFT operator (TMFFT) able to operate over three different fields: complex number domain C, Galois Fields GF(Ft) and GF(2m). We propose a reconfigurable Triple mode Multiplier that constitutes the core of the Butterflybased FFT. A scalable and flexible unit for the polynomial reduction needed in the GF(2m) multiplication is also proposed. An FPGA implementation of the proposed multiplier is given and the measures show a gain of 18%in terms of performance-to-cost ratio compared to a "Velcro" approach where two self-contained operators are implemented separately
Post-quantum cryptographic hardware primitives
The development and implementation of post-quantum cryptosystems have become a pressing issue in the design of secure computing systems, as general quantum computers have become more feasible in the last two years. In this work, we introduce a set of hardware post-quantum cryptographic primitives (PCPs) consisting of four frequently used security components, i.e., public-key cryptosystem (PKC), key exchange (KEX), oblivious transfer (OT), and zero-knowledge proof (ZKP). In addition, we design a high speed polynomial multiplier to accelerate these primitives. These primitives will aid researchers and designers in constructing quantum-proof secure computing systems in the post-quantum era.Published versio
Advanced digital SAR processing study
A highly programmable, land based, real time synthetic aperture radar (SAR) processor requiring a processed pixel rate of 2.75 MHz or more in a four look system was designed. Variations in range and azimuth compression, number of looks, range swath, range migration and SR mode were specified. Alternative range and azimuth processing algorithms were examined in conjunction with projected integrated circuit, digital architecture, and software technologies. The advaced digital SAR processor (ADSP) employs an FFT convolver algorithm for both range and azimuth processing in a parallel architecture configuration. Algorithm performace comparisons, design system design, implementation tradeoffs and the results of a supporting survey of integrated circuit and digital architecture technologies are reported. Cost tradeoffs and projections with alternate implementation plans are presented
Area and Power Efficient FFT/IFFT Processor for FALCON Post-Quantum Cryptography
Quantum computing is an emerging technology on the verge of reshaping
industries, while simultaneously challenging existing cryptographic algorithms.
FALCON, a recent standard quantum-resistant digital signature, presents a
challenging hardware implementation due to its extensive non-integer polynomial
operations, necessitating FFT over the ring . This paper
introduces an ultra-low power and compact processor tailored for FFT/IFFT
operations over the ring, specifically optimized for FALCON applications on
resource-constrained edge devices. The proposed processor incorporates various
optimization techniques, including twiddle factor compression and conflict-free
scheduling. In an ASIC implementation using a 22 nm GF process, the proposed
processor demonstrates an area occupancy of 0.15 mm and a power consumption
of 12.6 mW at an operating frequency of 167 MHz. Since a hardware
implementation of FFT/IFFT over the ring is currently non-existent, the
execution time achieved by this processor is compared to the software
implementation of FFT/IFFT of FALCON on a Raspberry Pi 4 with Cortex-A72, where
the proposed processor achieves a speedup of up to 2.3. Furthermore, in
comparison to dedicated state-of-the-art hardware accelerators for classic FFT,
this processor occupies 42\% less area and consumes 83\% less power, on
average. This suggests that the proposed hardware design offers a promising
solution for implementing FALCON on resource-constrained devices.Comment: 14 page
FPT: a Fixed-Point Accelerator for Torus Fully Homomorphic Encryption
Fully Homomorphic Encryption is a technique that allows computation on
encrypted data. It has the potential to change privacy considerations in the
cloud, but computational and memory overheads are preventing its adoption. TFHE
is a promising Torus-based FHE scheme that relies on bootstrapping, the
noise-removal tool invoked after each encrypted logical/arithmetical operation.
We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT is
the first hardware accelerator to exploit the inherent noise present in FHE
calculations. Instead of double or single-precision floating-point arithmetic,
it implements TFHE bootstrapping entirely with approximate fixed-point
arithmetic. Using an in-depth analysis of noise propagation in bootstrapping
FFT computations, FPT is able to use noise-trimmed fixed-point representations
that are up to 50% smaller than prior implementations.
FPT is built as a streaming processor inspired by traditional streaming DSPs:
it instantiates directly cascaded high-throughput computational stages, with
minimal control logic and routing networks. We explore throughput-balanced
compositions of streaming kernels with a user-configurable streaming width in
order to construct a full bootstrapping pipeline. Our approach allows 100%
utilization of arithmetic units and requires only a small bootstrapping key
cache, enabling an entirely compute-bound bootstrapping throughput of 1 BS /
35us. This is in stark contrast to the classical CPU approach to FHE
bootstrapping acceleration, which is typically constrained by memory and
bandwidth.
FPT is implemented and evaluated as a bootstrapping FPGA kernel for an Alveo
U280 datacenter accelerator card. FPT achieves two to three orders of magnitude
higher bootstrapping throughput than existing CPU-based implementations, and
2.5x higher throughput compared to recent ASIC emulation experiments.Comment: ACM CCS 202
Fixed-point realization of fast nonlinear Fourier transform algorithm for FPGA implementation of optical data processing
The nonlinear Fourier transform (NFT) based signal processing has attracted considerable attention as a promising tool for fibre nonlinearity mitigation in optical transmission. However, the mathematical complexity of NFT algorithms and the noticeable distinction of the latter from the “conventional” (Fourier-based) methods make it difficult to adapt this approach for practical applications. In our work, we demonstrate a hardware implementation of the fast direct NFT operation: it is used to map the optical signal onto its nonlinear Fourier spectrum, i.e. to demodulate the data. The main component of the algorithm is the matrix-multiplier unit, implemented on field-programmable gate arrays (FPGA) and used in our study for the estimation of required hardware resources. To design the best performing implementation in limited resources, we carry out the processing accuracy analysis to estimate the optimal bit width. The fast NFT algorithm that we analyse, is based on the FFT, which leads to the O(N log^{2}_{2} N) method’s complexity for the signal consisting of N samples. Our analysis revealed the significant demand in DSP blocks on the used board, which is caused by the complex-valued matrix operations and FFTs. Nevertheless, it seems to be possible to utilise further the parallelisation of our NFT-processing implementation for the more efficient NFT hardware realisation
- …