648 research outputs found

    High-Performance VLSI Architectures for Lattice-Based Cryptography

    Get PDF
    Lattice-based cryptography is a cryptographic primitive built upon the hard problems on point lattices. Cryptosystems relying on lattice-based cryptography have attracted huge attention in the last decade since they have post-quantum-resistant security and the remarkable construction of the algorithm. In particular, homomorphic encryption (HE) and post-quantum cryptography (PQC) are the two main applications of lattice-based cryptography. Meanwhile, the efficient hardware implementations for these advanced cryptography schemes are demanding to achieve a high-performance implementation. This dissertation aims to investigate the novel and high-performance very large-scale integration (VLSI) architectures for lattice-based cryptography, including the HE and PQC schemes. This dissertation first presents different architectures for the number-theoretic transform (NTT)-based polynomial multiplication, one of the crucial parts of the fundamental arithmetic for lattice-based HE and PQC schemes. Then a high-speed modular integer multiplier is proposed, particularly for lattice-based cryptography. In addition, a novel modular polynomial multiplier is presented to exploit the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication for lattice-based PQC scheme. Afterward, an NTT and Chinese remainder theorem (CRT)-based high-speed modular polynomial multiplier is presented for HE schemes whose moduli are large integers

    Approaches to low-power implementations of DSP systems

    Full text link

    Hardware implementation of daubechies wavelet transforms using folded AIQ mapping

    Get PDF
    The Discrete Wavelet Transform (DWT) is a popular tool in the field of image and video compression applications. Because of its multi-resolution representation capability, the DWT has been used effectively in applications such as transient signal analysis, computer vision, texture analysis, cell detection, and image compression. Daubechies wavelets are one of the popular transforms in the wavelet family. Daubechies filters provide excellent spatial and spectral locality-properties which make them useful in image compression. In this thesis, we present an efficient implementation of a shared hardware core to compute two 8-point Daubechies wavelet transforms. The architecture is based on a new two-level folded mapping technique, an improved version of the Algebraic Integer Quantization (AIQ). The scheme is developed on the factorization and decomposition of the transform coefficients that exploits the symmetrical and wrapping structure of the matrices. The proposed architecture is parallel, pipelined, and multiplexed. Compared to existing designs, the proposed scheme reduces significantly the hardware cost, critical path delay and power consumption with a higher throughput rate. Later, we have briefly presented a new mapping scheme to error-freely compute the Daubechies-8 tap wavelet transform, which is the next transform of Daubechies-6 in the Daubechies wavelet series. The multidimensional technique maps the irrational transformation basis coefficients with integers and results in considerable reduction in hardware and power consumption, and significant improvement in image reconstruction quality

    How to efficiently reconfigure tunable lookup tables for dynamic circuit specialization

    Get PDF
    Dynamic Circuit Specialization is used to optimize the implementation of a parameterized application on an FPGA. Instead of implementing the parameters as regular inputs, in the DCS approach these inputs are implemented as constants. When the parameter values change, the design is reoptimized for the new constant values by reconfiguring the FPGA. This allows faster and more resource-efficient implementation but investigations have shown that reconfiguration time is the major limitation for DCS implementation on Xilinx FPGAs. The limitation arises from the use of inefficient reconfiguration methods in conventional DCS implementation. To address this issue, we propose different approaches to reduce the reconfiguration time drastically and improve the reconfiguration speed. In this context, this paper presents the use of custom reconfiguration controllers and custom reconfiguration software drivers, along with placement constraints to shorten the reconfiguration time. Our results show an improvement in the reconfiguration speed by at least a factor 14 by using Xilinx reconfiguration controller along with placement constraints. However, the improvement can go up to a factor 40 with the combination of a custom reconfiguration controller, custom software drivers, and placement constraints. We also observe depreciation in the system’s performance by at least 6% due to placement constraints

    Low power digital signal processing

    Get PDF
    • …
    corecore