Search CORE

137 research outputs found

Significance of Logic Synthesis in FPGA-Based Design of Image and Signal Processing Systems

Author: Falkowski Bogdan J.
Luba Tadeusz
Rawski Mariusz
Selvaraj Henry
Publication venue: Digital Scholarship@UNLV
Publication date: 11/06/2008
Field of study

This chapter, taking FIR filters as an example, presents the discussion on efficiency of different implementation methodologies of DSP algorithms targeting modern FPGA architectures. Nowadays, programmable technology provides the possibility to implement digital systems with the use of specialized embedded DSP blocks. However, this technology gives the designer the possibility to increase efficiency of designed systems by exploitation of parallelisms of implemented algorithms. Moreover, it is possible to apply special techniques, such as distributed arithmetic (DA). Since in this approach, general-purpose multipliers are replaced by combinational LUT blocks, it is possible to construct digital filters of very high performance. Additionally, application of the functional decomposition-based method to LUT blocks optimization, and mapping has been investigated. The chapter presents results of the comparison of various design approaches in these areas

University of Nevada, Las Vegas Repository

High-Performance VLSI Architectures for Lattice-Based Cryptography

Author: Tan Weihang
Publication venue: Clemson University Libraries
Publication date: 01/12/2022
Field of study

Lattice-based cryptography is a cryptographic primitive built upon the hard problems on point lattices. Cryptosystems relying on lattice-based cryptography have attracted huge attention in the last decade since they have post-quantum-resistant security and the remarkable construction of the algorithm. In particular, homomorphic encryption (HE) and post-quantum cryptography (PQC) are the two main applications of lattice-based cryptography. Meanwhile, the efficient hardware implementations for these advanced cryptography schemes are demanding to achieve a high-performance implementation. This dissertation aims to investigate the novel and high-performance very large-scale integration (VLSI) architectures for lattice-based cryptography, including the HE and PQC schemes. This dissertation first presents different architectures for the number-theoretic transform (NTT)-based polynomial multiplication, one of the crucial parts of the fundamental arithmetic for lattice-based HE and PQC schemes. Then a high-speed modular integer multiplier is proposed, particularly for lattice-based cryptography. In addition, a novel modular polynomial multiplier is presented to exploit the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication for lattice-based PQC scheme. Afterward, an NTT and Chinese remainder theorem (CRT)-based high-speed modular polynomial multiplier is presented for HE schemes whose moduli are large integers

Clemson University: TigerPrints

PaReNTT: Low-Latency Parallel Residue Number System and NTT-Based Long Polynomial Modular Multiplication for Homomorphic Encryption

Author: Chiu Sin-Wei
Lao Yingjie
Parhi Keshab K.
Tan Weihang
Wang Antian
Publication venue
Publication date: 06/07/2023
Field of study

High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes four novel contributions. First, parallel NTT and iNTT architectures are proposed to reduce the number of clock cycles to process the polynomials. This can enable real-time processing for HE applications, as the number of clock cycles to process the polynomial is inversely proportional to the level of parallelism. Second, the proposed architecture eliminates the need for permuting the NTT outputs before their product is input to the iNTT. This reduces latency by n/4 clock cycles, where n is the length of the polynomial, and reduces buffer requirement by one delay-switch-delay circuit of size n. Third, an approach to select special moduli is presented where the moduli can be expressed in terms of a few signed power-of-two terms. Fourth, novel architectures for pre-processing for computing residual polynomials using the CRT and post-processing for combining the residual polynomials are proposed. These architectures significantly reduce the area consumption of the pre-processing and post-processing steps. The proposed long modular polynomial multiplications are ideal for applications that require low latency and high sample rate as these feed-forward architectures can be pipelined at arbitrary levels

arXiv.org e-Print Archive

Multiplierless Algorithm for Multivariate Gaussian Random Number Generation in FPGAs

Author: Luk W
Thomas DB
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

Spiral - Imperial College Digital Repository

Recommended from our members

Formally Verifiable Synthesis Flow In FPGAs

Author: Muttur Anurag V
Publication venue: ScholarWorks@UMass Amherst
Publication date: 28/10/2022
Field of study

FPGAs are used in a wide variety of digital systems. Due to their ability to support parallelism and specialization, these devices are becoming more commonplace in fields such as machine learning. One of the biggest benefits of FPGAs, logic specialization, can lead to security risks. Prior research has shown that a large variety of malicious circuits can snoop on sensitive user data, induce circuit faults, or physically damage the FPGA. These Trojan circuits can easily be crafted and embedded in FPGA designs. Often, these Trojans are small, consume little power in comparison to the target circuit, and are hard to detect via simulation or physical inspection. Computer-aided design (CAD) software in FPGAs has been the subject of extensive research and development of FPGAs for the past thirty-five years. The current FPGA software landscape includes vendors that provide widely used software flows to convert behavioral and register-transfer level (RTL) descriptions to bitstreams needed to program an FPGA device. Given the complexity of the algorithms needed to perform this translation, these CAD tool flows are generally structured as black boxes with limited transparency regarding design conversion steps or the logical equivalence of the generated design and initial design specification. vi This work explores the enhancement of open-source FPGA software, SymbiFlow, that focuses on FPGA RTL synthesis, place and route and bitstream generation. SymbiFlow uses Yosys for synthesis, VPR for place and route, and Project X-Ray for bitstream generation. We focus on synthesis using Yosys and formal verification using the Cadence Conformal Logic Equivalence Checker (LEC) for Xilinx Artix-7 FPGAs. Yosys is used to synthesize 160 benchmarks written in Verilog. We implement required code modifications to Yosys for designs to pass the equivalence checker. For Conformal, this work involves processing 160 benchmark designs with the equivalence checker. Parameters can be toggled on or off to obtain results that indicates if a design has passed formal verification when comparing RTL and synthesized netlists

ScholarWorks@UMass Amherst

Decomposition and encoding of finite state machines for FPGA implementation

Author: Slusarczyk A.S.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

xii+187hlm.;24c

Pure OAI Repository

uilis.unsyiah.ac.id

A Flexible NTT-based multiplier for Post-Quantum Cryptography

Author: Guido Masera
Kristjane Koleci
Maurizio Martina
Paolo Mazzetti
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

In this work an NTT-based (Number Theoretic Transform) multiplier for code-based Post-Quantum Cryptography (PQC) is presented, supporting Quasi Cyclic Low/Moderate-Density Parity-Check (QC LDPC/MDPC) codes. The cyclic matrix product, which is the fundamental operation required in this application, is treated as a polynomial product and adapted to the specific case of QC-MDPC codes proposed for Round 3 and 4 in the National Institute of Standards and Technology (NIST) competition for PQC. The multiplier is a fundamental component in both encryption and decryption, and the proposed solution leads to a flexible NTT-based multiplier, which can efficiently handle all types of required products, where the vectors have a length ≈104 and can be moderately sparse. The proposed architecture is implemented using both Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) technologies and, when compared with the best published results, it features a 10 times reduction of the encryption times with the area increased by 3 times. The proposed multiplier, incorporated in the encryption and decryption stages of a code-based PQC cryptosystem, leads to an improvement over the best published results between 3 to 10 times in terms of LC product (LUT times latency)

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Self-timed field programmmable gate array architectures

Author: Payne Robert
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Time domain based image generation for synthetic aperture radar on field programmable gate arrays

Author: Cholewa Fabian
Publication venue: Hannover : Institutionelles Repositorium der Leibniz Universität Hannover
Publication date: 01/01/2019
Field of study

Aerial images are important in different scenarios including surface cartography, surveillance, disaster control, height map generation, etc. Synthetic Aperture Radar (SAR) is one way to generate these images even through clouds and in the absence of daylight. For a wide and easy usage of this technology, SAR systems should be small, mounted to Unmanned Aerial Vehicles (UAVs) and process images in real-time. Since UAVs are small and lightweight, more robust (but also more complex) time-domain algorithms are required for good image quality in case of heavy turbulence. Typically the SAR data set size does not allow for ground transmission and processing, while the UAV size does not allow for huge systems and high power consumption to process the data. A small and energy-efficient signal processing system is therefore required. To fill the gap between existing systems that are capable of either high-speed processing or low power consumption, the focus of this thesis is the analysis, design, and implementation of such a system. A survey shows that most architectures either have to high power budgets or too few processing capabilities to match real-time requirements for time-domain-based processing. Therefore, a Field Programmable Gate Array (FPGA) based system is designed, as it allows for high performance and low-power consumption. The Global Backprojection (GBP) is implemented, as it is the standard time-domain-based algorithm which allows for highest image quality at arbitrary trajectories at the complexity of O(N3). To satisfy real-time requirements under all circumstances, the accelerated Fast Factorized Backprojection (FFBP) algorithm with a complexity of O(N2logN) is implemented as well, to allow for a trade-off between image quality and processing time. Additionally, algorithm and design are enhanced to correct the failing assumptions for Frequency Modulated Continuous Wave (FMCW) Radio Detection And Ranging (Radar) data at high velocities. Such sensors offer high-resolution data at considerably low transmit power which is especially interesting for UAVs. A full analysis of all algorithms is carried out, to design a highly utilized architecture for maximum throughput. The process covers the analysis of mathematical steps and approximations for hardware speedup, the analysis of code dependencies for instruction parallelism and the analysis of streaming capabilities, including memory access and caching strategies, as well as parallelization considerations and pipeline analysis. Each architecture is described in all details with its surrounding control structure. As proof of concepts, the architectures are mapped on a Virtex 6 FPGA and results on resource utilization, runtime and image quality are presented and discussed. A special framework allows to scale and port the design to other FPGAs easily and to enable for maximum resource utilization and speedup. The result is streaming architectures that are capable of massive parallelization with a minimum in system stalls. It is shown that real-time processing on FPGAs with strict power budgets in time-domain is possible with the GBP (mid-sized images) and the FFBP (any image size with a trade-off in quality), allowing for a UAV scenario

Institutionelles Repositorium der Leibniz Universität Hannover