45 research outputs found

    Hardware implementation of elliptic curve Diffie-Hellman key agreement scheme in GF(p)

    Get PDF
    With the advent of technology there are many applications that require secure communication. Elliptic Curve Public-key Cryptosystems are increasingly becoming popular due to their small key size and efficient algorithm. Elliptic curves are widely used in various key exchange techniques including Diffie-Hellman Key Agreement scheme. Modular multiplication and modular division are one of the basic operations in elliptic curve cryptography. Much effort has been made in developing efficient modular multiplication designs, however few works has been proposed for the modular division. Nevertheless, these operations are needed in various cryptographic systems. This thesis examines various scalable implementations of elliptic curve scalar multiplication employing multiplicative inverse or field division in GF(p) focussing mainly on modular divison architectures. Next, this thesis presents a new architecture for modular division based on the variant of Extended Binary GCD algorithm. The main contribution at system level architecture to the modular division unit is use of counters in place of shift registers that are basis of the algorithm and modifying the algorithm to introduce a modular correction unit for the output logic. This results in 62% increase in speed with respect to a prototype design. Finally, using the modular division architecture an Elliptic Curve ALU in GF(p) was implemented which can be used as the core arithmetic unit of an elliptic curve processor. The resulting architecture was targeted to Xilinx Vertex2v6000-bf957 FPGA device and can be implemented for different elliptic curves for almost all practical values of field p. The frequency of the ALU is 58.8 MHz for 128-bits utilizing 20% of the device at 27712 gates which is 30% faster than a prototype implementation with a 2% increase in area utilization. The ALU was tested to perform Diffie-Hellman Key Agreement Scheme and is suitable for other public-key cryptographic algorithms

    Efficient Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT Coordinators

    Get PDF
    Security management for IoT applications is a critical research field, especially when taking into account the performance variation over the very different IoT devices. In this paper, we present high-performance client/server coordinators on low-cost SoC-FPGA devices for secure IoT data collection. Security is ensured by using the Transport Layer Security (TLS) protocol based on the TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 cipher suite. The hardware architecture of the proposed coordinators is based on SW/HW co-design, implementing within the hardware accelerator core Elliptic Curve Scalar Multiplication (ECSM), which is the core operation of Elliptic Curve Cryptosystems (ECC). Meanwhile, the control of the overall TLS scheme is performed in software by an ARM Cortex-A9 microprocessor. In fact, the implementation of the ECC accelerator core around an ARM microprocessor allows not only the improvement of ECSM execution but also the performance enhancement of the overall cryptosystem. The integration of the ARM processor enables to exploit the possibility of embedded Linux features for high system flexibility. As a result, the proposed ECC accelerator requires limited area, with only 3395 LUTs on the Zynq device used to perform high-speed, 233-bit ECSMs in 413 µs, with a 50 MHz clock. Moreover, the generation of a 384-bit TLS handshake secret key between client and server coordinators requires 67.5 ms on a low cost Zynq 7Z007S device

    Efficient Side-Channel Aware Elliptic Curve Cryptosystems over Prime Fields

    Get PDF
    Elliptic Curve Cryptosystems (ECCs) are utilized as an alternative to traditional public-key cryptosystems, and are more suitable for resource limited environments due to smaller parameter size. In this dissertation we carry out a thorough investigation of side-channel attack aware ECC implementations over finite fields of prime characteristic including the recently introduced Edwards formulation of elliptic curves, which have built-in resiliency against simple side-channel attacks. We implement Joye\u27s highly regular add-always scalar multiplication algorithm both with the Weierstrass and Edwards formulation of elliptic curves. We also propose a technique to apply non-adjacent form (NAF) scalar multiplication algorithm with side-channel security using the Edwards formulation. Our results show that the Edwards formulation allows increased area-time performance with projective coordinates. However, the Weierstrass formulation with affine coordinates results in the simplest architecture, and therefore has the best area-time performance as long as an efficient modular divider is available

    A Brand-New, Area - Efficient Architecture for the FFT Algorithm Designed for Implementation of FPGAs

    Get PDF
    Elliptic curve cryptography, which is more commonly referred to by its acronym ECC, is widely regarded as one of the most effective new forms of cryptography developed in recent times. This is primarily due to the fact that elliptic curve cryptography utilises excellent performance across a wide range of hardware configurations in addition to having shorter key lengths. A High Throughput Multiplier design was described for Elliptic Cryptographic applications that are dependent on concurrent computations. A Proposed (Carry-Select) Division Architecture is explained and proposed throughout the whole of this work. Because of the carry-select architecture that was discussed in this article, the functionality of the divider has been significantly enhanced. The adder carry chain is reduced in length by this design by a factor of two, however this comes at the expense of additional adders and control. When it comes to designs for high throughput FFT, the total number of butterfly units that are implemented is what determines the amount of space that is needed by an FFT processor. In addition to blocks that may either add or subtract numbers, each butterfly unit also features blocks that can multiply numbers. The size of the region that is covered by these dual mathematical blocks is decided by the bit resolution of the models. When the bit resolution is increased, the area will also increase. The standard FFT approach requires that each stage contain  times as many butterfly units as the stage before it. This requirement must be met before moving on to the next stage

    Prime Field ECDSA Signature Processing for Reconfigurable Embedded Systems

    Get PDF
    Growing ubiquity and safety relevance of embedded systems strengthen the need to protect their functionality against malicious attacks. Communication and system authentication by digital signature schemes is a major issue in securing such systems. This contribution presents a complete ECDSA signature processing system over prime fields for bit lengths of up to 256 on reconfigurable hardware. By using dedicated hardware implementation, the performance can be improved by up to two orders of magnitude compared to microcontroller implementations. The flexible system is tailored to serve as an autonomous subsystem providing authentication transparent for any application. Integration into a vehicle-to-vehicle communication system is shown as an application example

    Differential Power Analysis Resistant Hardware Implementation Of The Rsa Cryptosystem

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2007Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2007Bu çalışmada, RSA kripto sistemi donanımsal olarak gerçeklenmiş ve daha sonra bir Yan-Kanal Analizi çeşidi olan Diferansiyel Güç Analizi (DGA) ile yapılacak saldırılara karşı dayanıklı hale getirilmiştir. RSA kripto sisteminde şifreleme ve şifre çözmede modüler üs alma işlemi yapılır: M^E(mod N). Bu çalışmadaki RSA kripto sisteminde, Xilinx Sahada Programlanabilir Kapı Dizisi (FPGA) donanım olarak kullanılmıştır. Modüler üs alma işlemi, art arda çarpmalar ile yapılır. Bu gerçeklemede kullanılan Montgomery modüler çarpıcı, Elde Saklamalı Toplayıcılar ile gerçeklenmiştir. Saldırgan, Güç Analizi yaparak kripto sistemin gizli anahtarını ele geçirebilir. Bu tezde ilk gerçekleştirilen RSA devresi DGA’ya karşı korumasızdır. XCV1000E üzerinde gerçeklendiğinde, 81,06 MHz maksimum saat frekansı, 104,85 Kb/s işlem hacmi ve 4,88 ms toplam üs alma süresine sahip olduğu ve 9037 dilimlik alan kapladığı görülmüştür. Itoh ve diğ. tarafından önerilen Rastgele Tablolu Pencere Yöntemi (RT-WM) algoritması ile RSA şifreleme algoritmasına getirilen değişiklik, algoritmik karşı durma yöntemlerinden biridir ve donanım üzerinde gerçeklenmemiştir. Yapılan ikinci gerçeklemede, ilk gerçeklemenin üzerine bu algoritmanın getirdiği değişiklikler uygulanmıştır. RT-WM’nin donanım gerçeklemesi, 512-bit anahtar uzunluğu, 2-bit pencere genişliği ve 3-bitlik bir rastgele sayı kullanarak, XCV1000E üzerinde yapıldığında, 66,66 MHz maksimum saat frekansı, 84,42 Kb/s işlem hacmi ve 6,06 ms toplam üs alma süresine sahip olduğu ve XCV1000E içinde hazır bulunan blok SelectRAM yapısının kullanılmasıyla birlikte 10986 dilimlik alan kapladığı görülmüştür. Korumalı gerçekleme, korumasız ile karşılaştırıldığında, toplam sürenin %24,2 arttığı, işlem hacminin de %19,5 azaldığı görülmektedir.In this study, RSA cryptosystem was implemented on hardware and afterwards it was modified to be resistant against Differential Power Analysis (DPA) attacks, which are a type of Side-Channel Analysis Attacks. The encryption and decryption in an RSA cryptosystem is modular exponentiation. In this study, Xilinx Field Programmable Gate Array (FPGA) devices have been used as hardware. Modular exponentiation is realized with sequential multiplications. The Montgomery modular multiplier in this implementation has been realized with Carry-Save Adders. By doing a Power Analysis, the attacker can extract the secret key of the cryptosystem. In this thesis, the primarily implemented RSA circuit is unprotected against DPA attacks. Implemented on XCV1000E, it has 81,06 MHz maximum clock frequency, 104,85 Kb/s of throughput, and 4,88 ms of total exponentiation time, occupying an area of 9037 slices. The modification to the RSA encryption algorithm that comes with the Randomized Table Window Method (RT-WM), proposed by Itoh et al., is one of the algorithmic countermeasures against DPA and has not been implemented on hardware. Realized using 512-bit key length, 2-bit window length, and, a 3-bit random number, on XCV1000E, the RT-WM hardware implementation resulted in 66,66 MHz maximum clock frequency, 84,42 Kb/s of throughput, and 6,06 ms of total exponentiation time and occupied an area of 10986 slices with the use of the built-in block SelectRAM structure inside XCV1000E. When comparing the protected implementation with the unprotected, it can be seen that the total time has increased by 24,2% while the throughput has decreased 19,5%.Yüksek LisansM.Sc

    The Design Space of Ultra-low Energy Asymmetric Cryptography

    Get PDF
    The energy cost of asymmetric cryptography, a vital component of modern secure communications, inhibits its wide spread adoption within the ultra-low energy regimes such as Implantable Medical Devices (IMDs), Wireless Sensor Networks (WSNs), and Radio Frequency Identification tags (RFIDs). In literature, a plethora of hardware and software acceleration techniques exists for improving the performance of asymmetric cryptography. However, very little attention has been focused on the energy efficiency. Therefore, in this dissertation, I explore the design space thoroughly, evaluating proposed hardware acceleration techniques in terms of energy cost and showing how effective they are at reducing the energy per cryptographic operation. To do so, I estimate the energy consumption for six different hardware/software configurations across five levels of security, including both GF(p) and GF(2^m) computation. First, we design and evaluate an efficient baseline architecture for pure software-based cryptography, which is centered around a pipelined RISC processor with 256KB of program ROM and 16KB of RAM. Then, we augment our processor design with simple, yet beneficial instruction set extensions for GF(p) computation and evaluate the improvement in terms of energy per cryptographic operation compared to the baseline microarchitecture. While examining the energy breakdown of the system, it became clear that fetching instructions from program memory was contributing significantly to the overall energy consumption. Thus, we implement a parameterizable instruction cache and simulate various configurations. We determine that for our working set, the energy-optimal instruction cache is 4KB, providing a 25% energy improvement over the baseline architecture for a 192-bit key-size. Next, we introduce a reconfigurable GF(p) accelerator to our microarchitecture and mea sure the energy per operation against the baseline and the ISA extensions. For ISA extensions, we show between 1.32 and 1.45 factor improvement in energy efficiency over baseline, while for full acceleration we demonstrate a 5.17 to 6.34 factor improvement. Continuing towards greater efficiency, we investigate the energy efficiency of different arithmetic by first adding GF(2^m) instruction set extensions to our processor architecture and comparing them to their GF(p) counterpart. Finally, we design a non-configurable 163-bit GF(2^m) accelerator and perform some initial energy estimates, comparing them with our prior work. In the end, we discuss our ongoing research and make suggestions for future work. The work presented here, along with proposed future work, will aid in bringing asymmetric cryptography within reach of ultra-low energy devices

    Reconsidering big data security and privacy in cloud and mobile cloud systems

    Get PDF
    Large scale distributed systems in particular cloud and mobile cloud deployments provide great services improving people\u27s quality of life and organizational efficiency. In order to match the performance needs, cloud computing engages with the perils of peer-to-peer (P2P) computing and brings up the P2P cloud systems as an extension for federated cloud. Having a decentralized architecture built on independent nodes and resources without any specific central control and monitoring, these cloud deployments are able to handle resource provisioning at a very low cost. Hence, we see a vast amount of mobile applications and services that are ready to scale to billions of mobile devices painlessly. Among these, data driven applications are the most successful ones in terms of popularity or monetization. However, data rich applications expose other problems to consider including storage, big data processing and also the crucial task of protecting private or sensitive information. In this work, first, we go through the existing layered cloud architectures and present a solution addressing the big data storage. Secondly, we explore the use of P2P Cloud System (P2PCS) for big data processing and analytics. Thirdly, we propose an efficient hybrid mobile cloud computing model based on cloudlets concept and we apply this model to health care systems as a case study. Then, the model is simulated using Mobile Cloud Computing Simulator (MCCSIM). According to the experimental power and delay results, the hybrid cloud model performs up to 75% better when compared to the traditional cloud models. Lastly, we enhance our proposals by presenting and analyzing security and privacy countermeasures against possible attacks

    Hardware Implementations of Scalable and Unified Elliptic Curve Cryptosystem Processors

    Get PDF
    As the amount of information exchanged through the network grows, so does the demand for increased security over the transmission of this information. As the growth of computers increased in the past few decades, more sophisticated methods of cryptography have been developed. One method of transmitting data securely over the network is by using symmetric-key cryptography. However, a drawback of symmetric-key cryptography is the need to exchange the shared key securely. One of the solutions is to use public-key cryptography. One of the modern public-key cryptography algorithms is called Elliptic Curve Cryptography (ECC). The advantage of ECC over some older algorithms is the smaller number of key sizes to provide a similar level of security. As a result, implementations of ECC are much faster and consume fewer resources. In order to achieve better performance, ECC operations are often offloaded onto hardware to alleviate the workload from the servers' processors. The most important and complex operation in ECC schemes is the elliptic curve point multiplication (ECPM). This thesis explores the implementation of hardware accelerators that offload the ECPM operation to hardware. These processors are referred to as ECC processors, or simply ECPs. This thesis targets the efficient hardware implementation of ECPs specifically for the 15 elliptic curves recommended by the National Institute of Standards and Technology (NIST). The main contribution of this thesis is the implementation of highly efficient hardware for scalable and unified finite field arithmetic units that are used in the design of ECPs. In this thesis, scalability refers to the processor's ability to support multiple key sizes without the need to reconfigure the hardware. By doing so, the hardware does not need to be redesigned for the server to handle different levels of security. Unified refers to the ability of the ECP to handle both prime and binary fields. The resultant designs are valuable to the research community and industry, as a single hardware device is able to handle a wide range of ECC operations efficiently and at high speeds. Thus, improving the ability of network servers to handle secure transaction more quickly and improve productivity at lower costs
    corecore