582 research outputs found

    Implementation of Generic and Efficient Architecture of Elliptic Curve Cryptography over Various GF(p) for Higher Data Security

    Get PDF
    Elliptic Curve Cryptography (ECC) has recognized much more attention over the last few years and has time-honored itself among the renowned public key cryptography schemes. The main feature of ECC is that shorter keys can be used as the best option for implementation of public key cryptography in resource-constrained (memory, power, and speed) devices like the Internet of Things (IoT), wireless sensor based applications, etc. The performance of hardware implementation for ECC is affected by basic design elements such as a coordinate system, modular arithmetic algorithms, implementation target, and underlying finite fields. This paper shows the generic structure of the ECC system implementation which allows the different types of designing parameters like elliptic curve, Galois prime finite field GF(p), and input type. The ECC system is analyzed with performance parameters such as required memory, elapsed time, and process complexity on the MATLAB platform. The simulations are carried out on the 8th generation Intel core i7 processor with the specifications of 8 GB RAM, 3.1 GHz, and 64-bit architecture. This analysis helps to design an efficient and high performance architecture of the ECC system on Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA).Elliptic Curve Cryptography (ECC) has recognized much more attention over the last few years and has time-honored itself among the renowned public key cryptography schemes. The main feature of ECC is that shorter keys can be used as the best option for implementation of public key cryptography in resource-constrained (memory, power, and speed) devices like the Internet of Things (IoT), wireless sensor based applications, etc. The performance of hardware implementation for ECC is affected by basic design elements such as a coordinate system, modular arithmetic algorithms, implementation target, and underlying finite fields. This paper shows the generic structure of the ECC system implementation which allows the different types of designing parameters like elliptic curve, Galois prime finite field GF(p), and input type. The ECC system is analyzed with performance parameters such as required memory, elapsed time, and process complexity on the MATLAB platform. The simulations are carried out on the 8th generation Intel core i7 processor with the specifications of 8 GB RAM, 3.1 GHz, and 64-bit architecture. This analysis helps to design an efficient and high performance architecture of the ECC system on Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA)

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Full text link
    Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper

    Designing Flexible, Energy Efficient and Secure Wireless Solutions for the Internet of Things

    Full text link
    The Internet of Things (IoT) is an emerging concept where ubiquitous physical objects (things) consisting of sensor, transceiver, processing hardware and software are interconnected via the Internet. The information collected by individual IoT nodes is shared among other often heterogeneous devices and over the Internet. This dissertation presents flexible, energy efficient and secure wireless solutions in the IoT application domain. System design and architecture designs are discussed envisioning a near-future world where wireless communication among heterogeneous IoT devices are seamlessly enabled. Firstly, an energy-autonomous wireless communication system for ultra-small, ultra-low power IoT platforms is presented. To achieve orders of magnitude energy efficiency improvement, a comprehensive system-level framework that jointly optimizes various system parameters is developed. A new synchronization protocol and modulation schemes are specified for energy-scarce ultra-small IoT nodes. The dynamic link adaptation is proposed to guarantee the ultra-small node to always operate in the most energy efficiency mode, given an operating scenario. The outcome is a truly energy-optimized wireless communication system to enable various new applications such as implanted smart-dust devices. Secondly, a configurable Software Defined Radio (SDR) baseband processor is designed and shown to be an efficient platform on which to execute several IoT wireless standards. It is a custom SIMD execution model coupled with a scalar unit and several architectural optimizations: streaming registers, variable bitwidth, dedicated ALUs, and an optimized reduction network. Voltage scaling and clock gating are employed to further reduce the power, with a more than a 100% time margin reserved for reliable operation in the near-threshold region. Two upper bound systems are evaluated. A comprehensive power/area estimation indicates that the overhead of realizing SDR flexibility is insignificant. The benefit of baseband SDR is quantified and evaluated. To further augment the benefits of a flexible baseband solution and to address the security issue of IoT connectivity, a light-weight Galois Field (GF) processor is proposed. This processor enables both energy-efficient block coding and symmetric/asymmetric cryptography kernel processing for a wide range of GF sizes (2^m, m = 2, 3, ..., 233) and arbitrary irreducible polynomials. Program directed connections among primitive GF arithmetic units enable dynamically configured parallelism to efficiently perform either four-way SIMD GF operations, including multiplicative inverse, or a long bit-width GF product in a single cycle. This demonstrates the feasibility of a unified architecture to enable error correction coding flexibility and secure wireless communication in the low power IoT domain.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137164/1/yajchen_1.pd

    A Brand-New, Area - Efficient Architecture for the FFT Algorithm Designed for Implementation of FPGAs

    Get PDF
    Elliptic curve cryptography, which is more commonly referred to by its acronym ECC, is widely regarded as one of the most effective new forms of cryptography developed in recent times. This is primarily due to the fact that elliptic curve cryptography utilises excellent performance across a wide range of hardware configurations in addition to having shorter key lengths. A High Throughput Multiplier design was described for Elliptic Cryptographic applications that are dependent on concurrent computations. A Proposed (Carry-Select) Division Architecture is explained and proposed throughout the whole of this work. Because of the carry-select architecture that was discussed in this article, the functionality of the divider has been significantly enhanced. The adder carry chain is reduced in length by this design by a factor of two, however this comes at the expense of additional adders and control. When it comes to designs for high throughput FFT, the total number of butterfly units that are implemented is what determines the amount of space that is needed by an FFT processor. In addition to blocks that may either add or subtract numbers, each butterfly unit also features blocks that can multiply numbers. The size of the region that is covered by these dual mathematical blocks is decided by the bit resolution of the models. When the bit resolution is increased, the area will also increase. The standard FFT approach requires that each stage contain  times as many butterfly units as the stage before it. This requirement must be met before moving on to the next stage

    MorphIC: A 65-nm 738k-Synapse/mm2^2 Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning

    Full text link
    Recent trends in the field of neural network accelerators investigate weight quantization as a means to increase the resource- and power-efficiency of hardware devices. As full on-chip weight storage is necessary to avoid the high energy cost of off-chip memory accesses, memory reduction requirements for weight storage pushed toward the use of binary weights, which were demonstrated to have a limited accuracy reduction on many applications when quantization-aware training techniques are used. In parallel, spiking neural network (SNN) architectures are explored to further reduce power when processing sparse event-based data streams, while on-chip spike-based online learning appears as a key feature for applications constrained in power and resources during the training phase. However, designing power- and area-efficient spiking neural networks still requires the development of specific techniques in order to leverage on-chip online learning on binary weights without compromising the synapse density. In this work, we demonstrate MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning rule and a hierarchical routing fabric for large-scale chip interconnection. The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF) neurons and more than two million plastic synapses for an active silicon area of 2.86mm2^2 in 65nm CMOS, achieving a high density of 738k synapses/mm2^2. MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff on the MNIST classification task compared to previously-proposed SNNs, while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE Transactions on Biomedical Circuits and Systems journal (2019), the fully-edited paper is available at https://ieeexplore.ieee.org/document/876400

    Implementation and Benchmarking of a Crypto Processor for a NB-IoT SoC Platform

    Get PDF
    The goal of this Master’s Thesis is to investigate the implementation of cryptographic algorithms for IoT and how these encryption systems can be integrated in a NarrowBand IoT platform. Following 3rd Generation Partnership Project (3GPP) specifications, the Evolved Packet System (EPS) Encryption Algorithms (EEA) and EPS Integrity Algorithms (EIA) have been implemented and tested. The latter are based on three different ciphering algorithms, used as keystream generators: Advanced Encryption Standard (AES), SNOW 3G and ZUC. These algorithms are used in Long Term Evolution (LTE) terminals to perform user data confidentiality and integrity protection. In the first place, a thorough study of the algorithms has been conducted. Then, we have used Matlab to generate a reference model of the algorithms and the High-Level Synthesis (HLS) design flow to generate the Register-Transfer Level (RTL) description from algorithmic descriptions in C++. The keystream generation and integrity blocks have been tested at RTL level. The confidentiality block has been described along with the control, datapath and interface block at a RTL level using System C language. The hardware blocks have been integrated into a processor capable of performing hardware confidentiality and integrity protection: the crypto processor. This Intellectual Property (IP) has been integrated and tested in a cycle accurate virtual platform. The outcome of this Master’s Thesis is a crypto processor capable of performing the proposed confidentiality and integrity algorithms under request.The Internet of Things (IoT) is one of the big revolutions that our society is expected to go through in the near future. This represents the inter-connection of devices, sensors, controllers, and any items, refereed as things, through a network that enables machine-to-machine communication. The number of connected devices will greatly increase. The applications taking advantage of IoT will enable to develop a great amount of technologies such as smart homes, smart cities and intelligent transportation. The possibilities allowed are huge and not yet fully explored. Picture yourself in the near future having a nice dinner with some friends. Then, you suddenly recall that your parking ticket expires in five minutes and unfortunately your car is parked some blocks away. You are having a good time and feel lazy to walk all the way to where you parked your car to pay for a time extension. Luckily enough, the parking meter is part of the IoT network and allows you, with the recently installed new application in your smart-phone, to pay this bill from anywhere you are. This payment will be sent to the parking meter and your time will be extended. Problem solved, right? Well, the risk comes when you perform your payment, not knowing that your "worst enemy" has interceded this communication and is able to alter your transaction. Perhaps, this individual decides to cancel your payment and you will have to pay a fine. Or even worse, this person steals your banking details and uses your money to take the vacations you’ve always wanted. There are many examples in our everyday life where we expose our personal information. With an increasing number of devices existing and using wireless communications without the action of an human, the security is a key aspect of IoT. This Master’s Thesis addresses the need to cover these security breaches in a world where an increasing amount of devices are communicating with each other. With the expansion of IoT where billions of devices will be connected wirelessly, our data will be widely spread over the air. The user will not be able to protect their sensible data without these securing capabilities. Therefore, different security algorithms used in today’s and tomorrow’s wireless technologies have been implemented on a chip to secure the communication. The confidentiality and integrity algorithms aim to solve the two aspects of the problem: protect the secrecy of banking details and prevent the alteration of the communication’s information. In this Master’s Thesis we have developed a hardware processor for securing data during a wireless communication, specifically designed for IoT applications. The developed system is realized with minimal area and power in mind, so that they can be fitted even in the smallest devices. We have compared many different hardware architectures, and after exploring many possible implementations, we have implemented the security algorithms on a hardware platform. We believe the content of this Thesis work is of great interest to anybody interested in hardware security applied to the IoT field. Furthermore, due to the processes and methodology used in this work, it will also be of interest to people who want to know more about how higher level programming languages can be used to describe such a specialized circuit, like one performing security algorithms. Finally, people interested in hardware and software co-simulation will find in this project a good example of the utilization of such system modeling technique

    A High-Throughput Hardware Implementation of NAT Traversal For IPSEC VPN

    Get PDF
    In this paper, we present a high-throughput FPGA implementation of IPSec core. The core supports both NAT and non-NAT mode and can be used in high speed security gateway devices. Although IPSec ESP is very computing intensive for its cryptography process, our implementation shows that it can achieve high throughput and low lantency. The system is realized on the Zynq XC7Z045 from Xilinx and was verified and tested in practice. Results show that the design can gives a peak throughput of 5.721 Gbps for the IPSec ESP tunnel mode in NAT mode and 7.753 Gbps in non-NAT mode using one single AES encrypt core. We also compare the performance of the core when running in other mode of encryption

    Transparent code authentication at the processor level

    Get PDF
    The authors present a lightweight authentication mechanism that verifies the authenticity of code and thereby addresses the virus and malicious code problems at the hardware level eliminating the need for trusted extensions in the operating system. The technique proposed tightly integrates the authentication mechanism into the processor core. The authentication latency is hidden behind the memory access latency, thereby allowing seamless on-the-fly authentication of instructions. In addition, the proposed authentication method supports seamless encryption of code (and static data). Consequently, while providing the software users with assurance for authenticity of programs executing on their hardware, the proposed technique also protects the software manufacturers’ intellectual property through encryption. The performance analysis shows that, under mild assumptions, the presented technique introduces negligible overhead for even moderate cache sizes

    Improved throughput of Elliptic Curve Digital Signature Algorithm (ECDSA) processor implementation over Koblitz curve k-163 on Field Programmable Gate Array (FPGA)

    Get PDF
    يقـدم البحث دراسة عن تصميم وتنفيذ دائرة الكترونية لتوليد التوقيع الالكتروني والتاكد من صحته ,بالاعتماد على مواصفات المنحني الاهليجي الموصى بها من  قبل المعهد الوطني للمعايير والتكنولوجيا(NIST) .حيث أرتكز العمل على إختيار منحني كوبلتز وتطبيقه على الحقول المنتهية أو ما تسمى بحقول غالو(2163)GF، ونظراً لأهمية تحسين الأداء في المعالجات الحديثة المبنية في بيئة البوابات المنطقية القابلة للبرمجة (FPGA)،  فقد أظهرت نتائج المحاكاة والتنفيذ للتصميم المقترح على الجهاز نوع Virtex5-xc5vlx155t-3ff1738  زيادة في معدل البيانات التي يتم معالجتها اثناء عمليتي توليد التوقيع واثبات صحته الى 0.08187 Mbit/s وبنسبة تصل الى 6.95% ,بالمقارنة مع التصميمات السابقة ، كما أستغرقت مدة تنفيذ العمليتين 1.66 ملي ثانية وبتردد أقصاه 83.477 ميكاهرتز. تم الاخذ بنظرالاعتبار تصميم المنفذ التسلسلي غير المتزامن (UART) والمستخدم في عملية نقل البيانات بين الحاسبة وFPGA .            The widespread use of the Internet of things (IoT) in different aspects of an individual’s life like banking, wireless intelligent devices and smartphones has led to new security and performance challenges under restricted resources. The Elliptic Curve Digital Signature Algorithm (ECDSA) is the most suitable choice for the environments due to the smaller size of the encryption key and changeable security related parameters. However, major performance metrics such as area, power, latency and throughput are still customisable and based on the design requirements of the device. The present paper puts forward an enhancement for the throughput performance metric by proposing a more efficient design for the hardware implementation of ECDSA. The design raised the throughput to 0.08207 Mbit/s, leading to an increase of 6.95% from the existing design. It also includes the design and implementation of the Universal Asynchronous Receiver Transmitter (UART) module. The present work is based on a 163-bit key-size over Koblitz curve k-163 and secure hash function SHA-1. A serial module for the underlying modular layer, high-speed architecture of Koblitz point addition and Koblitz point multiplication have been considered in this work, in addition to utilising the carry-save-multiplier, modular adder-subtractor and Extended Euclidean module for ECDSA protocols. All modules are designed using VHDL and implemented on the platform Virtex5 xc5vlx155t-3ff1738. Signature generation requires 0.55360ms, while its validation consumes 1.10947288ms. Thus, the total time required to complete both processes is equal to 1.66ms and the maximum frequency is approximately 83.477MHZ, consuming a power of 99mW with the efficiency approaching 3.39 * 10-6

    Efficient implementation of elliptic curve cryptography.

    Get PDF
    Elliptic Curve Cryptosystems (ECC) were introduced in 1985 by Neal Koblitz and Victor Miller. Small key size made elliptic curve attractive for public key cryptosystem implementation. This thesis introduces solutions of efficient implementation of ECC in algorithmic level and in computation level. In algorithmic level, a fast parallel elliptic curve scalar multiplication algorithm based on a dual-processor hardware system is developed. The method has an average computation time of n3 Elliptic Curve Point Addition on an n-bit scalar. The improvement is n Elliptic Curve Point Doubling compared to conventional methods. When a proper coordinate system and binary representation for the scalar k is used the average execution time will be as low as n Elliptic Curve Point Doubling, which makes this method about two times faster than conventional single processor multipliers using the same coordinate system. In computation level, a high performance elliptic curve processor (ECP) architecture is presented. The processor uses parallelism in finite field calculation to achieve high speed execution of scalar multiplication algorithm. The architecture relies on compile-time detection rather than of run-time detection of parallelism which results in less hardware. Implemented on FPGA, the proposed processor operates at 66MHz in GF(2 167) and performs scalar multiplication in 100muSec, which is considerably faster than recent implementations.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .A57. Source: Masters Abstracts International, Volume: 44-03, page: 1446. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005
    corecore