137 research outputs found

    Significance of Logic Synthesis in FPGA-Based Design of Image and Signal Processing Systems

    Full text link
    This chapter, taking FIR filters as an example, presents the discussion on efficiency of different implementation methodologies of DSP algorithms targeting modern FPGA architectures. Nowadays, programmable technology provides the possibility to implement digital systems with the use of specialized embedded DSP blocks. However, this technology gives the designer the possibility to increase efficiency of designed systems by exploitation of parallelisms of implemented algorithms. Moreover, it is possible to apply special techniques, such as distributed arithmetic (DA). Since in this approach, general-purpose multipliers are replaced by combinational LUT blocks, it is possible to construct digital filters of very high performance. Additionally, application of the functional decomposition-based method to LUT blocks optimization, and mapping has been investigated. The chapter presents results of the comparison of various design approaches in these areas

    High-Performance VLSI Architectures for Lattice-Based Cryptography

    Get PDF
    Lattice-based cryptography is a cryptographic primitive built upon the hard problems on point lattices. Cryptosystems relying on lattice-based cryptography have attracted huge attention in the last decade since they have post-quantum-resistant security and the remarkable construction of the algorithm. In particular, homomorphic encryption (HE) and post-quantum cryptography (PQC) are the two main applications of lattice-based cryptography. Meanwhile, the efficient hardware implementations for these advanced cryptography schemes are demanding to achieve a high-performance implementation. This dissertation aims to investigate the novel and high-performance very large-scale integration (VLSI) architectures for lattice-based cryptography, including the HE and PQC schemes. This dissertation first presents different architectures for the number-theoretic transform (NTT)-based polynomial multiplication, one of the crucial parts of the fundamental arithmetic for lattice-based HE and PQC schemes. Then a high-speed modular integer multiplier is proposed, particularly for lattice-based cryptography. In addition, a novel modular polynomial multiplier is presented to exploit the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication for lattice-based PQC scheme. Afterward, an NTT and Chinese remainder theorem (CRT)-based high-speed modular polynomial multiplier is presented for HE schemes whose moduli are large integers

    PaReNTT: Low-Latency Parallel Residue Number System and NTT-Based Long Polynomial Modular Multiplication for Homomorphic Encryption

    Full text link
    High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes four novel contributions. First, parallel NTT and iNTT architectures are proposed to reduce the number of clock cycles to process the polynomials. This can enable real-time processing for HE applications, as the number of clock cycles to process the polynomial is inversely proportional to the level of parallelism. Second, the proposed architecture eliminates the need for permuting the NTT outputs before their product is input to the iNTT. This reduces latency by n/4 clock cycles, where n is the length of the polynomial, and reduces buffer requirement by one delay-switch-delay circuit of size n. Third, an approach to select special moduli is presented where the moduli can be expressed in terms of a few signed power-of-two terms. Fourth, novel architectures for pre-processing for computing residual polynomials using the CRT and post-processing for combining the residual polynomials are proposed. These architectures significantly reduce the area consumption of the pre-processing and post-processing steps. The proposed long modular polynomial multiplications are ideal for applications that require low latency and high sample rate as these feed-forward architectures can be pipelined at arbitrary levels

    Multiplierless Algorithm for Multivariate Gaussian Random Number Generation in FPGAs

    Get PDF

    Decomposition and encoding of finite state machines for FPGA implementation

    Get PDF
    xii+187hlm.;24c

    A Flexible NTT-based multiplier for Post-Quantum Cryptography

    Get PDF
    In this work an NTT-based (Number Theoretic Transform) multiplier for code-based Post-Quantum Cryptography (PQC) is presented, supporting Quasi Cyclic Low/Moderate-Density Parity-Check (QC LDPC/MDPC) codes. The cyclic matrix product, which is the fundamental operation required in this application, is treated as a polynomial product and adapted to the specific case of QC-MDPC codes proposed for Round 3 and 4 in the National Institute of Standards and Technology (NIST) competition for PQC. The multiplier is a fundamental component in both encryption and decryption, and the proposed solution leads to a flexible NTT-based multiplier, which can efficiently handle all types of required products, where the vectors have a length ≈104 and can be moderately sparse. The proposed architecture is implemented using both Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) technologies and, when compared with the best published results, it features a 10 times reduction of the encryption times with the area increased by 3 times. The proposed multiplier, incorporated in the encryption and decryption stages of a code-based PQC cryptosystem, leads to an improvement over the best published results between 3 to 10 times in terms of LC product (LUT times latency)

    Self-timed field programmmable gate array architectures

    Get PDF

    Time domain based image generation for synthetic aperture radar on field programmable gate arrays

    Get PDF
    Aerial images are important in different scenarios including surface cartography, surveillance, disaster control, height map generation, etc. Synthetic Aperture Radar (SAR) is one way to generate these images even through clouds and in the absence of daylight. For a wide and easy usage of this technology, SAR systems should be small, mounted to Unmanned Aerial Vehicles (UAVs) and process images in real-time. Since UAVs are small and lightweight, more robust (but also more complex) time-domain algorithms are required for good image quality in case of heavy turbulence. Typically the SAR data set size does not allow for ground transmission and processing, while the UAV size does not allow for huge systems and high power consumption to process the data. A small and energy-efficient signal processing system is therefore required. To fill the gap between existing systems that are capable of either high-speed processing or low power consumption, the focus of this thesis is the analysis, design, and implementation of such a system. A survey shows that most architectures either have to high power budgets or too few processing capabilities to match real-time requirements for time-domain-based processing. Therefore, a Field Programmable Gate Array (FPGA) based system is designed, as it allows for high performance and low-power consumption. The Global Backprojection (GBP) is implemented, as it is the standard time-domain-based algorithm which allows for highest image quality at arbitrary trajectories at the complexity of O(N3). To satisfy real-time requirements under all circumstances, the accelerated Fast Factorized Backprojection (FFBP) algorithm with a complexity of O(N2logN) is implemented as well, to allow for a trade-off between image quality and processing time. Additionally, algorithm and design are enhanced to correct the failing assumptions for Frequency Modulated Continuous Wave (FMCW) Radio Detection And Ranging (Radar) data at high velocities. Such sensors offer high-resolution data at considerably low transmit power which is especially interesting for UAVs. A full analysis of all algorithms is carried out, to design a highly utilized architecture for maximum throughput. The process covers the analysis of mathematical steps and approximations for hardware speedup, the analysis of code dependencies for instruction parallelism and the analysis of streaming capabilities, including memory access and caching strategies, as well as parallelization considerations and pipeline analysis. Each architecture is described in all details with its surrounding control structure. As proof of concepts, the architectures are mapped on a Virtex 6 FPGA and results on resource utilization, runtime and image quality are presented and discussed. A special framework allows to scale and port the design to other FPGAs easily and to enable for maximum resource utilization and speedup. The result is streaming architectures that are capable of massive parallelization with a minimum in system stalls. It is shown that real-time processing on FPGAs with strict power budgets in time-domain is possible with the GBP (mid-sized images) and the FFBP (any image size with a trade-off in quality), allowing for a UAV scenario
    • …
    corecore