191 research outputs found

    Coupled FPGA/ASIC Implementation of Elliptic Curve Crypto-Processor

    Full text link

    Performance Analysis of Montgomery Multiplier using 32nm CNTFET Technology

    Get PDF
    In VLSI design vacillating the parameters results in variation of critical factors like area, power and delay. The dominant sources of power dissipation in digital systems are the digital multipliers. A digital multiplier plays a major role in a mixture of arithmetic operations in digital signal processing applications hinge on add and shift algorithms. In order to accomplish high execution speed, parallel array multipliers are comprehensively put into application. The crucial drawback of these multipliers is that it exhausts more power than any other multiplier architectures. Montgomery Multiplication is the popularly used algorithm as it is the most efficient technique to perform arithmetic based calculations. A high-speed multiplier is greatly coveted for its extraordinary leverage. The primary blocks of a multiplier are basically comprised of adders. Thus, in order to attain a significant reduction in power consumption at the chip level the power utilization in adders can be decreased. To obtain desired results in performance parameters of the multiplier an efficient and dynamic adder is proposed and incorporated in the Montgomery multiplier. The Carbon Nanotube field effect transistor (CNTFET) is a promising new device that may supersede some of the fundamental limitations of a silicon based MOSFET. The architecture has been designed in 130nm and 32nm CMOS and CNTFET technology in Synopsys HSpice. The analysed parameters that are considered in determining the performance are power delay product, power and delay and comparison is made with both the technologies.The simulation results of this paper affirmed the CNTFET based Montgomery multiplier improved power consumption by 76.47% ,speed by 72.67% and overall energy by 67.76% as compared to MOSFET-based Montgomery multiplier

    Speeding up a scalable modular inversion hardware architecture

    Get PDF
    The modular inversion is a fundamental process in several cryptographic systems. It can be computed in software or hardware, but hardware computation proven to be faster and more secure. This research focused on improving an old scalable inversion hardware architecture proposed in 2004 for finite field GF(p). The architecture has been made of two parts, a computing unit and a memory unit. The memory unit is to hold all the data bits of computation whereas the computing unit performs all the arithmetic operations in word (digit) by word bases known as scalable method. The main objective of this project was to investigate the cost and benefit of modifying the memory unit to include parallel shifting, which was one of the tasks of the scalable computing unit. The study included remodeling the entire hardware architecture removing the shifter from the scalable computing part embedding it in the memory unit instead. This modification resulted in a speedup to the complete inversion process with an area increase due to the new memory shifting unit. Quantitative measurements of the speed area trade-off have been investigated. The results showed that the extra hardware to be added for this modification compared to the speedup gained, giving the user the complete picture to choose from depending on the application need.the British council in Saudi Arabia, KFUPM, Dr. Tatiana Kalganova at the Electrical & Computer Engineering Department of Brunel University in Uxbridg

    Cryptographic primitives on reconfigurable platforms.

    Get PDF
    Tsoi Kuen Hung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 84-92).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Objectives --- p.3Chapter 1.3 --- Contributions --- p.3Chapter 1.4 --- Thesis Organization --- p.4Chapter 2 --- Background and Review --- p.6Chapter 2.1 --- Introduction --- p.6Chapter 2.2 --- Cryptographic Algorithms --- p.6Chapter 2.3 --- Cryptographic Applications --- p.10Chapter 2.4 --- Modern Reconfigurable Platforms --- p.11Chapter 2.5 --- Review of Related Work --- p.14Chapter 2.5.1 --- Montgomery Multiplier --- p.14Chapter 2.5.2 --- IDEA Cipher --- p.16Chapter 2.5.3 --- RC4 Key Search --- p.17Chapter 2.5.4 --- Secure Random Number Generator --- p.18Chapter 2.6 --- Summary --- p.19Chapter 3 --- The IDEA Cipher --- p.20Chapter 3.1 --- Introduction --- p.20Chapter 3.2 --- The IDEA Algorithm --- p.21Chapter 3.2.1 --- Cipher Data Path --- p.21Chapter 3.2.2 --- S-Box: Multiplication Modulo 216 + 1 --- p.23Chapter 3.2.3 --- Key Schedule --- p.24Chapter 3.3 --- FPGA-based IDEA Implementation --- p.24Chapter 3.3.1 --- Multiplication Modulo 216 + 1 --- p.24Chapter 3.3.2 --- Deeply Pipelined IDEA Core --- p.26Chapter 3.3.3 --- Area Saving Modification --- p.28Chapter 3.3.4 --- Key Block in Memory --- p.28Chapter 3.3.5 --- Pipelined Key Block --- p.30Chapter 3.3.6 --- Interface --- p.31Chapter 3.3.7 --- Pipelined Design in CBC Mode --- p.31Chapter 3.4 --- Summary --- p.32Chapter 4 --- Variable Radix Montgomery Multiplier --- p.33Chapter 4.1 --- Introduction --- p.33Chapter 4.2 --- RSA Algorithm --- p.34Chapter 4.3 --- Montgomery Algorithm - Ax B mod N --- p.35Chapter 4.4 --- Systolic Array Structure --- p.36Chapter 4.5 --- Radix-2k Core --- p.37Chapter 4.5.1 --- The Original Kornerup Method (Bit-Serial) --- p.37Chapter 4.5.2 --- The Radix-2k Method --- p.38Chapter 4.5.3 --- Time-Space Relationship of Systolic Cells --- p.38Chapter 4.5.4 --- Design Correctness --- p.40Chapter 4.6 --- Implementation Details --- p.40Chapter 4.7 --- Summary --- p.41Chapter 5 --- Parallel RC4 Engine --- p.42Chapter 5.1 --- Introduction --- p.42Chapter 5.2 --- Algorithms --- p.44Chapter 5.2.1 --- RC4 --- p.44Chapter 5.2.2 --- Key Search --- p.46Chapter 5.3 --- System Architecture --- p.47Chapter 5.3.1 --- RC4 Cell Design --- p.47Chapter 5.3.2 --- Key Search --- p.49Chapter 5.3.3 --- Interface --- p.50Chapter 5.4 --- Implementation --- p.50Chapter 5.4.1 --- RC4 cell --- p.51Chapter 5.4.2 --- Floorplan --- p.53Chapter 5.5 --- Summary --- p.53Chapter 6 --- Blum Blum Shub Random Number Generator --- p.55Chapter 6.1 --- Introduction --- p.55Chapter 6.2 --- RRNG Algorithm . . --- p.56Chapter 6.3 --- PRNG Algorithm --- p.58Chapter 6.4 --- Architectural Overview --- p.59Chapter 6.5 --- Implementation --- p.59Chapter 6.5.1 --- Hardware RRNG --- p.60Chapter 6.5.2 --- BBS PRNG --- p.61Chapter 6.5.3 --- Interface --- p.66Chapter 6.6 --- Summary --- p.66Chapter 7 --- Experimental Results --- p.68Chapter 7.1 --- Design Platform --- p.68Chapter 7.2 --- IDEA Cipher --- p.69Chapter 7.2.1 --- Size of IDEA Cipher --- p.70Chapter 7.2.2 --- Performance of IDEA Cipher --- p.70Chapter 7.3 --- Variable Radix Systolic Array --- p.71Chapter 7.4 --- Parallel RC4 Engine --- p.75Chapter 7.5 --- BBS Random Number Generator --- p.76Chapter 7.5.1 --- Size --- p.76Chapter 7.5.2 --- Speed --- p.76Chapter 7.5.3 --- External Clock --- p.77Chapter 7.5.4 --- Random Performance --- p.78Chapter 7.6 --- Summary --- p.78Chapter 8 --- Conclusion --- p.81Chapter 8.1 --- Future Development --- p.83Bibliography --- p.8

    Overview of research results on hardware-accelerated cryptography and security

    Get PDF
    This paper provides an overview of the research findings related to cryptographic hardware, acceleration of cryptanalytical algorithms, FPGA design automation and testing, as well as security service provisioning achieved by the author until the time of writing. The paper also refers to a few results developed in the framework of funded research projects which involved the author as a team member. The text briefly describes the implications of the main research results, indicating the corresponding publication and the essential insights behind each work

    Evaluation of Large Integer Multiplication Methods on Hardware

    Get PDF

    Research Activities on FPGA Design, Cryptographic Hardware, and Security Services

    Get PDF
    This paper reports on the main research results achieved by the author, including activities carried out in the context of funded Research Projects, until year 2012. The report presents an overview of the findings involving cryptographic hardware, as well as the results related to the acceleration of cryptanalytical algorithms. Another major research line involved FPGA design automation and testing. The above results were complemented by works on security service provisioning in distributed environments. The report presents an exhaustive description of all the scientific works derived from the above activities, indicating the essential insights behind each of them and the main results collected from the experimental evaluation

    Research works on electronic system-level design, FPGA testing, and security building blocks

    Get PDF
    This document presents an overview of the research activity carried out by the author until the date of writing. It is also meant to report on the main results generated by a few funded project involving the author as a team member. The activity covered a range of topics involving automated generation of on-chip multiprocessor systems from high-level code, with particular emphasis on the system interconnect and the memory subsystems, design automation and test techniques for hardware-reconfigurable technologies, the design of advanced hardware blocks for cryptographic and cryptanalytical applications, the implementation and evaluation of security services in distributed environments, with special focus on time-stamping and public-key certification services, as well as the interplay between security services and hardware reconfigurability. The document presents the main highlights from the published works spawned by each of the above research threads

    Architectural Solutions for NanoMagnet Logic

    Get PDF
    The successful era of CMOS technology is coming to an end. The limit on minimum fabrication dimensions of transistors and the increasing leakage power hinder the technological scaling that has characterized the last decades. In several different ways, this problem has been addressed changing the architectures implemented in CMOS, adopting parallel processors and thus increasing the throughput at the same operating frequency. However, architectural alternatives cannot be the definitive answer to a continuous increase in performance dictated by Moore’s law. This problem must be addressed from a technological point of view. Several alternative technologies that could substitute CMOS in next years are currently under study. Among them, magnetic technologies such as NanoMagnet Logic (NML) are interesting because they do not dissipate any leakage power. More- over, magnets have memory capability, so it is possible to merge logic and memory in the same device. However, magnetic circuits, and NML in this specific research, have also some important drawbacks that need to be addressed: first, the circuit clock frequency is limited to 100 MHz, to avoid errors in data propagation; second, there is a connection between circuit layout and timing, and in particular, longer wires will have longer latency. These drawbacks are intrinsic to the technology and for this reason they cannot be avoided. The only chance is to limit their impact from an architectural point of view. The first step followed in the research path of this thesis is indeed the choice and optimization of architectures able to deal with the problems of NML. Systolic Ar- rays are identified as an ideal solution for this technology, because they are regular structures with local interconnections that limit the long latency of wires; more- over they are composed of several Processing Elements that work in parallel, thus exploit parallelization to increase throughput (limiting the impact of the low clock frequency). Through the analysis of Systolic Arrays for NML, several possible im- provements have been identified and addressed: 1) it has been defined a rigorous way to increase throughput with interleaving, providing equations that allow to esti- mate the number of operations to be interleaved and the rules to provide inputs; 2) a latency insensitive circuit has been designed, that exploits a data communication protocol between processing elements to avoid data synchronization problems. This feature has been exploited to design a latency insensitive Systolic Array that is able to execute the Floyd-Steinberg dithering algorithm. All the improvements presented in this framework apply to Systolic Arrays implemented in any technology. So, they can also be exploited to increase performance of today’s CMOS parallel circuits. This research path is presented in Chapter 3. While Systolic Arrays are an interesting solution for NML, their usage could be quite limited because they are normally application-specific. The second re- search path addresses this problem. A Reconfigurable Systolic Array is presented, that can be programmed to execute several algorithms. This architecture has been tested implementing many algorithms, including FIR and IIR filters, Discrete Cosine Transform and Matrix Multiplication. This research path is presented in Chapter 4. In common Von Neumann architectures, the logic part of the circuit and the memory one are separated. Today bus communication between logic and memory represents the bottleneck of the system. This problem is addressed presenting Logic- In-Memory (LIM), an architecture where memory elements are merged in logic ones. This research path aims at defining a real LIM architectures. This has been done in two steps. The first step is represented by an architecture composed of three layers: memory, routing and logic. In the second step instead the routing plane is no more present, and its features are inherited by the memory plane. In this solution, a pyramidal memory model is used, where memories near logic elements contain the most probably used data, and other memory layers contain the remaining data and instruction set. This circuit has been tested with odd-even sort algorithms and it has been benchmarked against GPUs and ASIC. This research path is presented in Chapter 5. MagnetoElastic NML (ME-NML) is a technological improvement of the NML principle, proposed by researchers of Politecnico di Torino, where the clock system is based on the induced stretch of a piezoelectric substrate when a voltage is ap- plied to its boundaries. The main advantage of this solution is that it consumes much less power than the classic clock implementation. This technology has not yet been investigated from an architectural point of view and considering complex circuits. In this research field, a standard methodology for the design of ME-NML circuits has been proposed. It is based on a Standard Cell Library and an enhanced VHDL model. The effectiveness of this methodology has been proved designing a Galois Field Multiplier. Moreover the serial-parallel trade-off in ME-NML has been investigated, designing three different solutions for the Multiply and Accumulate structure. This research path is presented in Chapter 6. While ME-NML is an extremely interesting technology, it needs to be combined with other faster technologies to have a real competitive system. Signal interfaces between NML and other technologies (mainly CMOS) have been rarely presented in literature. A mixed-technology multiplexer is designed and presented as the basis for a CMOS to NML interface. The reverse interface (from ME-NML to CMOS) is instead based on a sensing circuit for the Faraday effect: a change in the polarization of a magnet induces an electric field that can be used to generate an input signal for a CMOS circuit. This research path is presented in Chapter 7. The research work presented in this thesis represents a fundamental milestone in the path towards nanotechnologies. The most important achievement is the de- sign and simulation of complex circuits with NML, benchmarking this technology with real application examples. The characterization of a technology considering complex functions is a major step to be performed and that has not yet been ad- dressed in literature for NML. Indeed, only in this way it is possible to intercept in advance any weakness of NanoMagnet Logic that cannot be discovered consid- ering only small circuits. Moreover, the architectural improvements introduced in this thesis, although technology-driven, can be actually applied to any technology. We have demonstrated the advantages that can derive applying them to CMOS cir- cuits. This thesis represents therefore a major step in two directions: the first is the enhancement of NML technology; the second is a general improvement of parallel architectures and the development of the new Logic-In-Memory paradigm
    • …
    corecore