356 research outputs found

    Implementação de um co-processador RSA

    Get PDF

    A Horizontally Reconfigurable Architecture for Extended Precision Arithmetic (Parallel Computing, Condition Codes Factoring).

    Get PDF
    A special computer for high-precision arithmetic and parallel processing which features an ALU that is dynamically reconfigurable under program control has been designed and a prototype machine constructed. The 256-bit ALU consists of eight 32-bit slices each of which has its own ALU operation code in each microinstruction. The slices can remain logically separated from each other, or can be dynamically connected to either or both of their neighbors under control of a segment control code that is part of each microinstruction. The result is a unique parallel architecture which provides real parallelism to user programs at the instruction level while globally retaining a sequential control structure. Management of parallelism is achieved through a two level hierarchy of condition codes and extended instruction sets to support conditional instruction execution. New types of parallel micro-programming tools introduce a system for reconfiguration management and parallel programming. An assembler, debug simulator, and interactive operating environment have been implemented. An analysis of the instruction times to execute arithmetic operations on the machine show that it will be exceptionally fast for problems in computational number theory and factoring of integers

    Software and hardware implementation of the RSA public key cipher

    Get PDF
    Cryptographic systems and their use in communications are presented. The advantages obtained by the use of a public key cipher and the importance of this in a commercial environment are stressed. Two two main public key ciphers are considered. The RSA public key cipher is introduced and various methods for implementing this cipher on a standard, nondedicated, 8 bit microprocessor are investigated. The performance of the different algorithms are evaluated and compared. Various ways of increasing the performance are considered. The limitations imposed by the performance on the practical use of the cipher are discussed. The importance of the key to the security of the cipher is assessed. Different forms of attack are mentioned and a procedure for generating keys, which minimise the probability of a sucessful attack is presented. This procedure is implemented on a minicomputer. Use of the method on personal computers or microprocessors is examined. Methods for performing multiplication in hardware, with particular emphasis on the use of these methods in modular multiplication, are detailed. An algorithm for performing part of the encryption function in hardware and the hardware necessary for it is described. Different methods for implementing the hardware are discussed and one is choosen. A description of the hardware unit is given. The design and development of an application specific integrated circuit (ASIC) to perform key elements of the encryption function is described. The various stages of the design process are detailed. The results expected from this device and its integration into the overall encryption scheme are presented

    Generating and Searching Families of FFT Algorithms

    Full text link
    A fundamental question of longstanding theoretical interest is to prove the lowest exact count of real additions and multiplications required to compute a power-of-two discrete Fourier transform (DFT). For 35 years the split-radix algorithm held the record by requiring just 4n log n - 6n + 8 arithmetic operations on real numbers for a size-n DFT, and was widely believed to be the best possible. Recent work by Van Buskirk et al. demonstrated improvements to the split-radix operation count by using multiplier coefficients or "twiddle factors" that are not n-th roots of unity for a size-n DFT. This paper presents a Boolean Satisfiability-based proof of the lowest operation count for certain classes of DFT algorithms. First, we present a novel way to choose new yet valid twiddle factors for the nodes in flowgraphs generated by common power-of-two fast Fourier transform algorithms, FFTs. With this new technique, we can generate a large family of FFTs realizable by a fixed flowgraph. This solution space of FFTs is cast as a Boolean Satisfiability problem, and a modern Satisfiability Modulo Theory solver is applied to search for FFTs requiring the fewest arithmetic operations. Surprisingly, we find that there are FFTs requiring fewer operations than the split-radix even when all twiddle factors are n-th roots of unity.Comment: Preprint submitted on March 28, 2011, to the Journal on Satisfiability, Boolean Modeling and Computatio

    Formal verification of a fully IEEE compliant floating point unit

    Get PDF
    In this thesis we describe the formal verification of a fully IEEE compliant floating point unit (FPU). The hardware is verified on the gate-level against a formalization of the IEEE standard. The verification is performed using the theorem proving system PVS. The FPU supports both single and double precision floating point numbers, normal and denormal numbers, all four IEEE rounding modes, and exceptions as required by the standard. Beside the verification of the combinatorial correctness of the FPUs we pipeline the FPUs to allow the integration into an out-of-order processor. We formally define the correctness criterion the pipelines must obey in order to work properly within the processor. We then describe a new methodology based on combining model checking and theorem proving for the verification of the pipelines.Die vorliegende Arbeit behandelt die formale Verifikation einer vollständig IEEE konformen Floating Point Unit (FPU). Die Hardware wird auf Gatter-Ebene gegen eine Formalisierung des IEEE Standards verifiziert. Zur Verifikation wird das Beweis-System PVS benutzt. Die FPU unterstützt Fließkommazahlen mit einfacher und doppelter Genauigkeit, normale und denormale Zahlen, alle vier Rundungsmodi und alle Exception-Signale. Neben der Verifikation der kombinatorischen Schaltkreise werden die FPUs gepipelined, um sie in einen Out-of-order Prozessor zu integrieren. Die Korrektheits- Kriterien, die die gepipelineten FPUs befolgen müssen, um im Prozessor korrekt zu arbeiten, werden formal definiert. Es wird eine neue Methode zur Verifikation solcher Pipelines beschrieben. Die Methode beruht auf der Kombination von Model-Checking und Theorem-Proving

    Data structures for algebraic manipulation

    Get PDF
    Imperial Users onl

    Generalised Mersenne Numbers Revisited

    Get PDF
    Generalised Mersenne Numbers (GMNs) were defined by Solinas in 1999 and feature in the NIST (FIPS 186-2) and SECG standards for use in elliptic curve cryptography. Their form is such that modular reduction is extremely efficient, thus making them an attractive choice for modular multiplication implementation. However, the issue of residue multiplication efficiency seems to have been overlooked. Asymptotically, using a cyclic rather than a linear convolution, residue multiplication modulo a Mersenne number is twice as fast as integer multiplication; this property does not hold for prime GMNs, unless they are of Mersenne's form. In this work we exploit an alternative generalisation of Mersenne numbers for which an analogue of the above property --- and hence the same efficiency ratio --- holds, even at bitlengths for which schoolbook multiplication is optimal, while also maintaining very efficient reduction. Moreover, our proposed primes are abundant at any bitlength, whereas GMNs are extremely rare. Our multiplication and reduction algorithms can also be easily parallelised, making our arithmetic particularly suitable for hardware implementation. Furthermore, the field representation we propose also naturally protects against side-channel attacks, including timing attacks, simple power analysis and differential power analysis, which is essential in many cryptographic scenarios, in constrast to GMNs.Comment: 32 pages. Accepted to Mathematics of Computatio

    A survey of compiler development aids

    Get PDF
    A theoretical background was established for the compilation process by dividing it into five phases and explaining the concepts and algorithms that underpin each. The five selected phases were lexical analysis, syntax analysis, semantic analysis, optimization, and code generation. Graph theoretical optimization techniques were presented, and approaches to code generation were described for both one-pass and multipass compilation environments. Following the initial tutorial sections, more than 20 tools that were developed to aid in the process of writing compilers were surveyed. Eight of the more recent compiler development aids were selected for special attention - SIMCMP/STAGE2, LANG-PAK, COGENT, XPL, AED, CWIC, LIS, and JOCIT. The impact of compiler development aids were assessed some of their shortcomings and some of the areas of research currently in progress were inspected

    Formal verification of a fully IEEE compliant floating point unit

    Get PDF
    In this thesis we describe the formal verification of a fully IEEE compliant floating point unit (FPU). The hardware is verified on the gate-level against a formalization of the IEEE standard. The verification is performed using the theorem proving system PVS. The FPU supports both single and double precision floating point numbers, normal and denormal numbers, all four IEEE rounding modes, and exceptions as required by the standard. Beside the verification of the combinatorial correctness of the FPUs we pipeline the FPUs to allow the integration into an out-of-order processor. We formally define the correctness criterion the pipelines must obey in order to work properly within the processor. We then describe a new methodology based on combining model checking and theorem proving for the verification of the pipelines.Die vorliegende Arbeit behandelt die formale Verifikation einer vollständig IEEE konformen Floating Point Unit (FPU). Die Hardware wird auf Gatter-Ebene gegen eine Formalisierung des IEEE Standards verifiziert. Zur Verifikation wird das Beweis-System PVS benutzt. Die FPU unterstützt Fließkommazahlen mit einfacher und doppelter Genauigkeit, normale und denormale Zahlen, alle vier Rundungsmodi und alle Exception-Signale. Neben der Verifikation der kombinatorischen Schaltkreise werden die FPUs gepipelined, um sie in einen Out-of-order Prozessor zu integrieren. Die Korrektheits- Kriterien, die die gepipelineten FPUs befolgen müssen, um im Prozessor korrekt zu arbeiten, werden formal definiert. Es wird eine neue Methode zur Verifikation solcher Pipelines beschrieben. Die Methode beruht auf der Kombination von Model-Checking und Theorem-Proving

    Revisiting ECM on GPUs

    Get PDF
    Modern public-key cryptography is a crucial part of our contemporary life where a secure communication channel with another party is needed. With the advance of more powerful computing architectures – especially Graphics Processing Units (GPUs) – traditional approaches like RSA and Diffie-Hellman schemes are more and more in danger of being broken. We present a highly optimized implementation of Lenstra’s ECM algorithm customized for GPUs. Our implementation uses state-of-the-art elliptic curve arithmetic and optimized integer arithmetic while providing the possibility of arbitrarily scaling ECM’s parameters allowing an application even for larger discrete logarithm problems. Furthermore, the proposed software is not limited to any specific GPU generation and is to the best of our knowledge the first implementation supporting multiple device computation. To this end, for a bound of B1=8,192 and a modulus size of 192 bit, we achieve a throughput of 214 thousand ECM trials per second on a modern RTX 2080 Ti GPU considering only the first stage of ECM. To solve the Discrete Logarithm Problem for larger bit sizes, our software can easily support larger parameter sets such that a throughput of 2,781 ECM trials per second is achieved using B1=50,000, B2=5,000,000, and a modulus size of 448 bit
    corecore