17 research outputs found

    On Board Georeferencing Using FPGA-Based Optimized Second Order Polynomial Equation

    Get PDF
    For real-time monitoring of natural disasters, such as fire, volcano, flood, landslide, and coastal inundation, highly-accurate georeferenced remotely sensed imagery is needed. Georeferenced imagery can be fused with geographic spatial data sets to provide geographic coordinates and positing for regions of interest. This paper proposes an on-board georeferencing method for remotely sensed imagery, which contains five modules: input data, coordinate transformation, bilinear interpolation, and output data. The experimental results demonstrate multiple benefits of the proposed method: (1) the computation speed using the proposed algorithm is 8 times faster than that using PC computer; (2) the resources of the field programmable gate array (FPGA) can meet the requirements of design. In the coordinate transformation scheme, 250,656 LUTs, 499,268 registers, and 388 DSP48s are used. Furthermore, 27,218 LUTs, 45,823 registers, 456 RAM/FIFO, and 267 DSP48s are used in the bilinear interpolation module; (3) the values of root mean square errors (RMSEs) are less than one pixel, and the other statistics, such as maximum error, minimum error, and mean error are less than one pixel; (4) the gray values of the georeferenced image when implemented using FPGA have the same accuracy as those implemented using MATLAB and Visual studio (C++), and have a very close accuracy implemented using ENVI software; and (5) the on-chip power consumption is 0.659W. Therefore, it can be concluded that the proposed georeferencing method implemented using FPGA with second-order polynomial model and bilinear interpolation algorithm can achieve real-time geographic referencing for remotely sensed imagery

    Designing Efficient and Flexible NTT Accelerators

    Get PDF
    The Number Theoretic Transform (NTT) is a powerful mathematical tool with a wide range of applications in various fields, including signal processing, cryptography, and error correction codes. In recent years, there has been a growing interest in efficiently implementing the NTT on hardware platforms for lattice-based cryptography within the context of NIST\u27s Post-Quantum Cryptography (PQC) competition. The implementation of NTT in cryptography stands as a pivotal advancement, revolutionizing various security protocols. By enabling efficient arithmetic operations in polynomial rings, NTT significantly enhances the speed and security of lattice-based cryptographic schemes, contributing to the development of robust homomorphic encryption, key exchange, and digital signature systems. This article presents a new implementation of the Number Theoretic Transform for FPGA platforms. The focus of the implementation lies in achieving a flexible trade-off between resource usage and computation speed. By strategically adjusting the allocation of BRAM and DSP resources, the NTT computation can be optimized for either high-speed processing or resource conservation. The proposed implementation is specifically designed for polynomial multiplication with a degree of 256, accommodating coefficients of various bit sizes. Furthermore, the constant-geometry (Pease) method was utilized as an alternative to the Cooley-Tukey graph method, resulting in a notable simplification of BRAM addressing procedures. This adaptability renders it suitable for cryptographic algorithms like CRYSTALS-Dilithium and CRYSTALS-Kyber, which use 256-degree polynomials

    Multi-Unit Serial Polynomial Multiplier to Accelerate NTRU-Based Cryptographic Schemes in IoT Embedded Systems

    Get PDF
    Concern for the security of embedded systems that implement IoT devices has become a crucial issue, as these devices today support an increasing number of applications and services that store and exchange information whose integrity, privacy, and authenticity must be adequately guaranteed. Modern lattice-based cryptographic schemes have proven to be a good alternative, both to face the security threats that arise as a consequence of the development of quantum computing and to allow efficient implementations of cryptographic primitives in resource-limited embedded systems, such as those used in consumer and industrial applications of the IoT. This article describes the hardware implementation of parameterized multi-unit serial polynomial multipliers to speed up time-consuming operations in NTRU-based cryptographic schemes. The flexibility in selecting the design parameters and the interconnection protocol with a general-purpose processor allow them to be applied both to the standardized variants of NTRU and to the new proposals that are being considered in the post-quantum contest currently held by the National Institute of Standards and Technology, as well as to obtain an adequate cost/performance/security-level trade-off for a target application. The designs are provided as AXI4 bus-compliant intellectual property modules that can be easily incorporated into embedded systems developed with the Vivado design tools. The work provides an extensive set of implementation and characterization results in devices of the Xilinx Zynq-7000 and Zynq UltraScale+ families for the different sets of parameters defined in the NTRUEncrypt standard. It also includes details of their plug and play inclusion as hardware accelerators in the C implementation of this public-key encryption scheme codified in the LibNTRU library, showing that acceleration factors of up to 3.1 are achieved when compared to pure software implementations running on the processing systems included in the programmable devices.European Union 952622Ministerio de Ciencia e Innovación PID2020-116664RB100, 10.13039/50110001103

    A model‐based design floating‐point accumulator. Case of study: FPGA implementation of a support vector machine kernel function

    Get PDF
    Recent research in wearable sensors have led to the development of an advanced platform capable of embedding complex algorithms such as machine learning algorithms, which are known to usually be resource‐demanding. To address the need for high computational power, one solution is to design custom hardware platforms dedicated to the specific application by exploiting, for example, Field Programmable Gate Array (FPGA). Recently, model‐based techniques and automatic code generation have been introduced in FPGA design. In this paper, a new model‐based floating‐point accumulation circuit is presented. The architecture is based on the state‐of‐the‐art delayed buffering algorithm. This circuit was conceived to be exploited in order to compute the kernel function of a support vector machine. The implementation of the proposed model was carried out in Simulink, and simulation results showed that it had better performance in terms of speed and occupied area when compared to other solutions. To better evaluate its figure, a practical case of a polynomial kernel function was considered. Simulink and VHDL post‐implementation timing simulations and measurements on FPGA confirmed the good results of the stand‐alone accumulator

    High-Speed Hardware Architectures and FPGA Benchmarking of CRYSTALS-Kyber, NTRU, and Saber

    Get PDF
    Performance in hardware has typically played a significant role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance metrics, such as latency, number of operations per second, power consumption, and energy usage, as well as in terms of security against physical attacks, including side-channel analysis. Using hardware also permits much higher flexibility in trading one subset of these properties for another. This paper presents high-speed hardware architectures for four lattice-based CCA-secure Key Encapsulation Mechanisms (KEMs), representing three NIST PQC finalists: CRYSTALS-Kyber, NTRU (with two distinct variants, NTRU-HPS and NTRU-HRSS), and Saber. We rank these candidates among each other and compare them with all other Round 3 KEMs based on the data from the previously reported work

    Rethinking FPGA Architectures for Deep Neural Network applications

    Get PDF
    The prominence of machine learning-powered solutions instituted an unprecedented trend of integration into virtually all applications with a broad range of deployment constraints from tiny embedded systems to large-scale warehouse computing machines. While recent research confirms the edges of using contemporary FPGAs to deploy or accelerate machine learning applications, especially where the latency and energy consumption are strictly limited, their pre-machine learning optimised architectures remain a barrier to the overall efficiency and performance. Realizing this shortcoming, this thesis demonstrates an architectural study aiming at solutions that enable hidden potentials in the FPGA technology, primarily for machine learning algorithms. Particularly, it shows how slight alterations to the state-of-the-art architectures could significantly enhance the FPGAs toward becoming more machine learning-friendly while maintaining the near-promised performance for the rest of the applications. Eventually, it presents a novel systematic approach to deriving new block architectures guided by designing limitations and machine learning algorithm characteristics through benchmarking. First, through three modifications to Xilinx DSP48E2 blocks, an enhanced digital signal processing (DSP) block for important computations in embedded deep neural network (DNN) accelerators is described. Then, two tiers of modifications to FPGA logic cell architecture are explained that deliver a variety of performance and utilisation benefits with only minor area overheads. Eventually, with the goal of exploring this new design space in a methodical manner, a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations is first proposed. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then suggested together with a family of new embedded blocks, called MLBlocks

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    Applications in Electronics Pervading Industry, Environment and Society

    Get PDF
    This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF
    corecore