124 research outputs found

    A Brand-New, Area - Efficient Architecture for the FFT Algorithm Designed for Implementation of FPGAs

    Get PDF
    Elliptic curve cryptography, which is more commonly referred to by its acronym ECC, is widely regarded as one of the most effective new forms of cryptography developed in recent times. This is primarily due to the fact that elliptic curve cryptography utilises excellent performance across a wide range of hardware configurations in addition to having shorter key lengths. A High Throughput Multiplier design was described for Elliptic Cryptographic applications that are dependent on concurrent computations. A Proposed (Carry-Select) Division Architecture is explained and proposed throughout the whole of this work. Because of the carry-select architecture that was discussed in this article, the functionality of the divider has been significantly enhanced. The adder carry chain is reduced in length by this design by a factor of two, however this comes at the expense of additional adders and control. When it comes to designs for high throughput FFT, the total number of butterfly units that are implemented is what determines the amount of space that is needed by an FFT processor. In addition to blocks that may either add or subtract numbers, each butterfly unit also features blocks that can multiply numbers. The size of the region that is covered by these dual mathematical blocks is decided by the bit resolution of the models. When the bit resolution is increased, the area will also increase. The standard FFT approach requires that each stage contain  times as many butterfly units as the stage before it. This requirement must be met before moving on to the next stage

    RESOURCE EFFICIENT DESIGN OF QUANTUM CIRCUITS FOR CRYPTANALYSIS AND SCIENTIFIC COMPUTING APPLICATIONS

    Get PDF
    Quantum computers offer the potential to extend our abilities to tackle computational problems in fields such as number theory, encryption, search and scientific computation. Up to a superpolynomial speedup has been reported for quantum algorithms in these areas. Motivated by the promise of faster computations, the development of quantum machines has caught the attention of both academics and industry researchers. Quantum machines are now at sizes where implementations of quantum algorithms or their components are now becoming possible. In order to implement quantum algorithms on quantum machines, resource efficient circuits and functional blocks must be designed. In this work, we propose quantum circuits for Galois and integer arithmetic. These quantum circuits are necessary building blocks to realize quantum algorithms. The design of resource efficient quantum circuits requires the designer takes into account the gate cost, quantum bit (qubit) cost, depth and garbage outputs of a quantum circuit. Existing quantum machines do not have many qubits meaning that circuits with high qubit cost cannot be implemented. In addition, quantum circuits are more prone to errors and garbage output removal adds to overall cost. As more gates are used, a quantum circuit sees an increased rate of failure. Failures and error rates can be countered by using quantum error correcting codes and fault tolerant implementations of universal gate sets (such as Clifford+T gates). However, Clifford+T gates are costly to implement with the T gate being significantly more costly than the Clifford gates. As a result, designers working with Clifford+T gates seek to minimize the number of T gates (T-count) and the depth of T gates (T-depth). In this work, we propose quantum circuits for Galois and integer arithmetic with lower T-count, T-depth and qubit cost than existing work. This work presents novel quantum circuits for squaring and exponentiation over binary extension fields (Galois fields of form GF(2 m )). The proposed circuits are shown to have lower depth, qubit and gate cost to existing work. We also present quantum circuits for the core operations of multiplication and division which enjoy lower T-count, T-depth and qubit costs compared to existing work. This work also illustrates the design of a T-count and qubit cost efficient design for the square root. This work concludes with an illustration of how the arithmetic circuits can be combined into a functional block to implement quantum image processing algorithms

    Reliable and Fault-Resilient Schemes for Efficient Radix-4 Complex Division

    Get PDF
    Complex division is commonly used in various applications in signal processing and control theory including astronomy and nonlinear RF measurements. Nevertheless, unless reliability and assurance are embedded into the architectures of such structures, the suboptimal (and thus erroneous) results could undermine the objectives of such applications. As such, in this thesis, we present schemes to provide complex number division architectures based on (Sweeney, Robertson, and Tocher) SRT-division with fault diagnosis mechanisms. Different fault resilient architectures are proposed in this thesis which can be tailored based on the eventual objectives of the designs in terms of area and time requirements, among which we pinpoint carefully the schemes based on recomputing with shifted operands (RESO) to be able to detect both natural and malicious faults and with proper modification achieve high throughputs. The design also implements a minimized look up table approach which favors in error detection based designs and provides high fault coverage with relatively-low overhead. Additionally, to benchmark the effectiveness of the proposed schemes, extensive fault diagnosis assessments are performed for the proposed designs through fault simulations and FPGA implementations; the design is implemented on Xilinx Spartan-VI and Xilinx Virtex-VI FPGA families

    Residue Number Systems: a Survey

    Get PDF

    Hardware realization of discrete wavelet transform cauchy Reed Solomon minimal instruction set computer architecture for wireless visual sensor networks

    Get PDF
    Large amount of image data transmitting across the Wireless Visual Sensor Networks (WVSNs) increases the data transmission rate thus increases the power transmission. This would inevitably decreases the operating lifespan of the sensor nodes and affecting the overall operation of WVSNs. Limiting power consumption to prolong battery lifespan is one of the most important goals in WVSNs. To achieve this goal, this thesis presents a novel low complexity Discrete Wavelet Transform (DWT) Cauchy Reed Solomon (CRS) Minimal Instruction Set Computer (MISC) architecture that performs data compression and data encoding (encryption) in a single architecture. There are four different programme instructions were developed to programme the MISC processor, which are Subtract and Branch if Negative (SBN), Galois Field Multiplier (GF MULT), XOR and 11TO8 instructions. With the use of these programme instructions, the developed DWT CRS MISC were programmed to perform DWT image compression to reduce the image size and then encode the DWT coefficients with CRS code to ensure data security and reliability. Both compression and CRS encoding were performed by a single architecture rather than in two separate modules which require a lot of hardware resources (logic slices). By reducing the number of logic slices, the power consumption can be subsequently reduced. Results show that the proposed new DWT CRS MISC architecture implementation requires 142 Slices (Xilinx Virtex-II), 129 slices (Xilinx Spartan-3E), 144 Slices (Xilinx Spartan-3L) and 66 Slices (Xilinx Spartan-6). The developed DWT CRS MISC architecture has lower hardware complexity as compared to other existing systems, such as Crypto-Processor in Xilinx Spartan-6 (4828 Slices), Low-Density Parity-Check in Xilinx Virtex-II (870 slices) and ECBC in Xilinx Spartan-3E (1691 Slices). With the use of RC10 development board, the developed DWT CRS MISC architecture can be implemented onto the Xilinx Spartan-3L FPGA to simulate an actual visual sensor node. This is to verify the feasibility of developing a joint compression, encryption and error correction processing framework in WVSNs

    Efficient Implementation of Elliptic Curve Cryptography on FPGAs

    Get PDF
    This work presents the design strategies of an FPGA-based elliptic curve co-processor. Elliptic curve cryptography is an important topic in cryptography due to its relatively short key length and higher efficiency as compared to other well-known public key crypto-systems like RSA. The most important contributions of this work are: - Analyzing how different representations of finite fields and points on elliptic curves effect the performance of an elliptic curve co-processor and implementing a high performance co-processor. - Proposing a novel dynamic programming approach to find the optimum combination of different recursive polynomial multiplication methods. Here optimum means the method which has the smallest number of bit operations. - Designing a new normal-basis multiplier which is based on polynomial multipliers. The most important part of this multiplier is a circuit of size O(nlogn)O(n \log n) for changing the representation between polynomial and normal basis

    Design Of Multi-Modulation Baseband Modulator And Demodulator For Software Defined Radio

    Get PDF
    In contrast to hardware-based radio that only delivers single communication service using particular standard, the software defined radio (SDR) provides a highly reconfigurable platform to integrate various functions for multi-modulation, multiband and multi-standard wireless communication systems. However, this project is only based on multi-modulation SDR, such as 4-PAM, BPSK, QPSK and 16-QAM.The configurable multi-modulation baseband modulator (MMBM) and demodulator (MMBD) are designed using digital signal processing (DSP) algorithms based on common features shared by single-modulation structures, and then implemented into Xilinx Virtex-4 FPGA. Comparing the real-time and simulation results shows that the timings are equivalent, and the sign and magnitude changes are significant

    Survey of FPGA applications in the period 2000 – 2015 (Technical Report)

    Get PDF
    Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs

    HEAX: An Architecture for Computing on Encrypted Data

    Get PDF
    With the rapid increase in cloud computing, concerns surrounding data privacy, security, and confidentiality also have been increased significantly. Not only cloud providers are susceptible to internal and external hacks, but also in some scenarios, data owners cannot outsource the computation due to privacy laws such as GDPR, HIPAA, or CCPA. Fully Homomorphic Encryption (FHE) is a groundbreaking invention in cryptography that, unlike traditional cryptosystems, enables computation on encrypted data without ever decrypting it. However, the most critical obstacle in deploying FHE at large-scale is the enormous computation overhead. In this paper, we present HEAX, a novel hardware architecture for FHE that achieves unprecedented performance improvement. HEAX leverages multiple levels of parallelism, ranging from ciphertext-level to fine-grained modular arithmetic level. Our first contribution is a new highly-parallelizable architecture for number-theoretic transform (NTT) which can be of independent interest as NTT is frequently used in many lattice-based cryptography systems. Building on top of NTT engine, we design a novel architecture for computation on homomorphically encrypted data. We also introduce several techniques to enable an end-to-end, fully pipelined design as well as reducing on-chip memory consumption. Our implementation on reconfigurable hardware demonstrates 164-268x performance improvement for a wide range of FHE parameters.Comment: To appear in proceedings of ACM ASPLOS 202
    corecore