1,293 research outputs found

    Hardware Implementation of Bit-Parallel Finite Field Multipliers Based on Overlap-free Algorithm on FPGA

    Get PDF
    Cryptography can be divided into two fundamentally different classes: symmetric-key and public-key. Compared with symmetric-key cryptography, where the complexity of the security system relies on a single key between receiver and sender, public-key cryptographic system using two separate but mathematically related keys. Finite field multiplication is a key operation used in all cryptographic systems relied on finite field arithmetic as it not only is computationally complex but also one of the most frequently used finite field operations. Karatsuba algorithm and its generalization are most often used to construct multiplication architectures with significantly improved in these decades. However, one of its optimized architecture called Overlap-free Karatsuba algorithm has been mentioned by fewer people and even its implementation on FPGA has not been mentioned by anyone. After completion of a detailed study of this specific algorithm, this thesis has proposed implementation of modified Overlap-free Karatsuba algorithm on Xilinx Spartan-605. Applied this algorithm and its specific architecture, reduced gates or shorten critical path will be achieved for the given value of n.Optimized multiplication architecture, generated from proposed modified Overlap-free Karatsuba algorithm and applied on FPGA board,over NIST recommended fields (n = 128), are presented and analysed in detail. Compared with existing works with sub-quadratic space and time complexities, the proposed modified algorithm is highly recommended module and have improved on both space and time complexities. At last, generalization of proposed modified algorithm is suitable for much larger size of finite fields, and improvements of FPGA itself have been discussed

    Real-Time Lossless Compression of SoC Trace Data

    Get PDF
    Nowadays, with the increasing complexity of System-on-Chip (SoC), traditional debugging approaches are not enough in multi-core architecture systems. Hardware tracing becomes necessary for performance analysis in these systems. The problem is that the size of collected trace data through hardware-based tracing techniques is usually extremely large due to the increasing complexity of System-on-Chips. Hence on-chip trace compression performed in hardware is needed to reduce the amount of transferred or stored data. In this dissertation, the feasibility of different types of lossless data compression algorithms in hardware implementation are investigated and examined. A lossless data compression algorithm LZ77 is selected, analyzed, and optimized to Nexus traces data. In order to meet the hardware cost and compression performances requirements for the real-time compression, an optimized LZ77 compression algorithm is proposed based on the characteristics of Nexus trace data. This thesis presents a hardware implementation of LZ77 encoder described in Very High Speed Integrated Circuit Hardware Description Language (VHDL). Test results demonstrate that the compression speed can achieve16 bits/clock cycle and the average compression ratio is 1.35 for the minimal hardware cost case, which is a suitable trade-off between the hardware cost and the compression performances effectively

    Performance and area optimization for reliable FPGA-based shifter design

    Get PDF
    This thesis addresses the problem of implementing reliable FPGA-based shifters. An FPGA-based design requires optimization between performance and resource utilization, and an effective verification methodology to validate design behavior. The FPGA-based implementation of a large shifter design is restricted by an I/O resource bottleneck. The verification of the design behavior presents a further challenge due to the \u27black-box\u27 nature of FPGAs. To tackle these design challenges, we propose a novel approach to implement FPGA-based shifters. The proposed design alleviates the I/O bottleneck while significantly reducing the logic resources required. This is achieved with a minimal increase in the design delay. The design is seamlessly scalable to a multi-FPGA chip setup to improve performance or to implement larger shifters. It is configured using assertion checkers for efficient design verification. The assertion-based design is further optimized to alleviate the performance degradation caused by the assertion checkers

    Efficient Implementation of Elliptic Curve Cryptography on FPGAs

    Get PDF
    This work presents the design strategies of an FPGA-based elliptic curve co-processor. Elliptic curve cryptography is an important topic in cryptography due to its relatively short key length and higher efficiency as compared to other well-known public key crypto-systems like RSA. The most important contributions of this work are: - Analyzing how different representations of finite fields and points on elliptic curves effect the performance of an elliptic curve co-processor and implementing a high performance co-processor. - Proposing a novel dynamic programming approach to find the optimum combination of different recursive polynomial multiplication methods. Here optimum means the method which has the smallest number of bit operations. - Designing a new normal-basis multiplier which is based on polynomial multipliers. The most important part of this multiplier is a circuit of size O(nlogn)O(n \log n) for changing the representation between polynomial and normal basis

    Research in the effective implementation of guidance computers with large scale arrays Interim report

    Get PDF
    Functional logic character implementation in breadboard design of NASA modular compute

    Investigations into the feasibility of an on-line test methodology

    Get PDF
    This thesis aims to understand how information coding and the protocol that it supports can affect the characteristics of electronic circuits. More specifically, it investigates an on-line test methodology called IFIS (If it Fails It Stops) and its impact on the design, implementation and subsequent characteristics of circuits intended for application specific lC (ASIC) technology. The first study investigates the influences of information coding and protocol on the characteristics of IFIS systems. The second study investigates methods of circuit design applicable to IFIS cells and identifies the· technique possessing the characteristics most suitable for on-line testing. The third study investigates the characteristics of a 'real-life' commercial UART re-engineered using the techniques resulting from the previous two studies. The final study investigates the effects of the halting properties endowed by the protocol on failure diagnosis within IFIS systems. The outcome of this work is an identification and characterisation of the factors that influence behaviour, implementation costs and the ability to test and diagnose IFIS designs

    A Novel Logic Function Acquisition Methodology

    Get PDF
    This record was migrated from the OpenDepot repository service in June, 2017 before shutting down

    Transparent code authentication at the processor level

    Get PDF
    The authors present a lightweight authentication mechanism that verifies the authenticity of code and thereby addresses the virus and malicious code problems at the hardware level eliminating the need for trusted extensions in the operating system. The technique proposed tightly integrates the authentication mechanism into the processor core. The authentication latency is hidden behind the memory access latency, thereby allowing seamless on-the-fly authentication of instructions. In addition, the proposed authentication method supports seamless encryption of code (and static data). Consequently, while providing the software users with assurance for authenticity of programs executing on their hardware, the proposed technique also protects the software manufacturers’ intellectual property through encryption. The performance analysis shows that, under mild assumptions, the presented technique introduces negligible overhead for even moderate cache sizes

    Decimal Floating-point Fused Multiply Add with Redundant Number Systems

    Get PDF
    The IEEE standard of decimal floating-point arithmetic was officially released in 2008. The new decimal floating-point (DFP) format and arithmetic can be applied to remedy the conversion error caused by representing decimal floating-point numbers in binary floating-point format and to improve the computing performance of the decimal processing in commercial and financial applications. Nowadays, many architectures and algorithms of individual arithmetic functions for decimal floating-point numbers are proposed and investigated (e.g., addition, multiplication, division, and square root). However, because of the less efficiency of representing decimal number in binary devices, the area consumption and performance of the DFP arithmetic units are not comparable with the binary counterparts. IBM proposed a binary fused multiply-add (FMA) function in the POWER series of processors in order to improve the performance of floating-point computations and to reduce the complexity of hardware design in reduced instruction set computing (RISC) systems. Such an instruction also has been approved to be suitable for efficiently implementing not only stand-alone addition and multiplication, but also division, square root, and other transcendental functions. Additionally, unconventional number systems including digit sets and encodings have displayed advantages on performance and area efficiency in many applications of computer arithmetic. In this research, by analyzing the typical binary floating-point FMA designs and the design strategy of unconventional number systems, ``a high performance decimal floating-point fused multiply-add (DFMA) with redundant internal encodings" was proposed. First, the fixed-point components inside the DFMA (i.e., addition and multiplication) were studied and investigated as the basis of the FMA architecture. The specific number systems were also applied to improve the basic decimal fixed-point arithmetic. The superiority of redundant number systems in stand-alone decimal fixed-point addition and multiplication has been proved by the synthesis results. Afterwards, a new DFMA architecture which exploits the specific redundant internal operands was proposed. Overall, the specific number system improved, not only the efficiency of the fixed-point addition and multiplication inside the FMA, but also the architecture and algorithms to build up the FMA itself. The functional division, square root, reciprocal, reciprocal square root, and many other functions, which exploit the Newton's or other similar methods, can benefit from the proposed DFMA architecture. With few necessary on-chip memory devices (e.g., Look-up tables) or even only software routines, these functions can be implemented on the basis of the hardwired FMA function. Therefore, the proposed DFMA can be implemented on chip solely as a key component to reduce the hardware cost. Additionally, our research on the decimal arithmetic with unconventional number systems expands the way of performing other high-performance decimal arithmetic (e.g., stand-alone division and square root) upon the basic binary devices (i.e., AND gate, OR gate, and binary full adder). The proposed techniques are also expected to be helpful to other non-binary based applications
    corecore