10 research outputs found

    Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

    Get PDF
    Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

    Comprehensive Efficient Implementations of ECC on C54xx Family of Low-cost Digital Signal Processors

    Get PDF
    Resource constraints in smart devices demand an efficient cryptosystem that allows for low power and memory consumption. This has led to popularity of comparatively efficient Elliptic curve cryptog-raphy (ECC). Prior to this paper, much of ECC is implemented on re-configurable hardware i.e. FPGAs, which are costly and unfavorable as low-cost solutions. We present comprehensive yet efficient implementations of ECC on fixed-point TMS54xx series of digital signal processors (DSP). 160-bit prime field GF(p) ECC is implemented over a wide range of coordinate choices. This paper also implements windowed recoding technique to provide better execution times. Stalls in the programming are mini-mized by utilization of loop unrolling and by avoiding data dependence. Complete scalar multiplication is achieved within 50 msec in coordinate implementations, which is further reduced till 25 msec for windowed-recoding method. These are the best known results for fixed-point low power digital signal processor to date

    Vampire Attacks: Draining Life from Wireless Ad Hoc Sensor Networks

    Full text link

    A high performance pseudo-multi-core elliptic curve cryptographic processor over GF(2^163)

    Get PDF
    Elliptic curve cryptosystem is one type of public-key system, and it can guarantee the same security level with Rivest, Shamir and Adleman (RSA) with a smaller key size. Therefore, the key of elliptic curve cryptography (ECC) can be more compact, and it brings many advantages such as circuit area, memory requirement, power consumption, performance and bandwidth. However, compared to private key system, like Advanced Encryption Standard (AES), ECC is still much more complicated and computationally intensive. In some real applications, people usually combine private-key system with public-key system to achieve high performance. The ultimate goal of this research is to architect a high performance ECC processor for high performance applications such as network server and cellular sites. In this thesis, a high performance processor for ECC over Galois field (GF)(2^163) by using polynomial presentation is proposed for high-performance applications. It has three finite field (FF) reduced instruction set computer (RISC) cores and a main controller to achieve instruction-level parallelism (ILP) with pipeline so that the largely parallelized algorithm for elliptic curve point multiplication (PM) can be well suited on this platform. Instructions for combined FF operation are proposed to decrease clock cycles in the instruction set. The interconnection among three FF cores and the main controller is obtained by analyzing the data dependency in the parallelized algorithm. Five-stage pipeline is employed in this architecture. Finally, the u-code executed on these three FF cores is manually optimized to save clock cycles. The proposed design can reach 185 MHz with 20; 807 slices when implemented on Xilinx XC4VLX80 FPGA device and 263 MHz with 217,904 gates when synthesized with TSMC .18um CMOS technology. The implementation of the proposed architecture can complete one ECC PM in 1428 cycles, and is 1.3 times faster than the current fastest implementation over GF(2^163) reported in literature while consumes only 14:6% less area on the same FPGA device

    Hardware/software co-design of elliptic curve cryptography on an 8051 microcontroller

    No full text
    8-bit microcontrollers like the 8051 still hold a considerable share of the embedded systems market and dominate in the smart card industry. The performance of 8-bit microcontrollers is often too poor for the implementation of public-key cryptography in software. In this paper we present a minimalist hardware accelerator for enabling elliptic curve cryptography (ECC) on an 8051 microcontroller. We demonstrate the importance of removing system-level performance bottlenecks caused by the transfer of operands between hardware accelerator and external RAM. The integration of a small direct memory access (DMA) unit proves vital to exploit the full potential of the hardware accelerator. Our design allows to perform a scalar multiplication over the binary extension field GF(2 191) in 118 msec at a clock frequency of 12 MHz. Considering performance and hardware cost, our system compares favorably with previous work on similar 8-bit platforms

    Side-channel Analysis of Subscriber Identity Modules

    Get PDF
    Subscriber identity modules (SIMs) contain useful forensic data but are often locked with a PIN code that restricts access to this data. If an invalid PIN is entered several times, the card locks and may even destroy its stored data. This presents a challenge to the retrieval of data from the SIM when the PIN is unknown. The field of side-channel analysis (SCA) collects, identifies, and processes information leaked via inadvertent channels. One promising side-channel leakage is that of electromagnetic (EM) emanations; by monitoring the SIM\u27s emissions, it may be possible to determine the correct PIN to unlock the card. This thesis uses EM SCA techniques to attempt to discover the SIM card\u27s PIN. The tested SIM is subjected to simple and differential electromagnetic analysis. No clear data dependency or correlation is apparent. The SIM does reveal information pertaining to its validation routine, but the value of the card\u27s stored PIN does not appear to leak via EM emissions. Two factors contributing to this result are the black-box nature of PIN validation and the hardware and software SCA countermeasures. Further experimentation on SIMs with known operational characteristics is recommended to determine the viability of future SCA attacks on these devices

    The Design Space of Ultra-low Energy Asymmetric Cryptography

    Get PDF
    The energy cost of asymmetric cryptography, a vital component of modern secure communications, inhibits its wide spread adoption within the ultra-low energy regimes such as Implantable Medical Devices (IMDs), Wireless Sensor Networks (WSNs), and Radio Frequency Identification tags (RFIDs). In literature, a plethora of hardware and software acceleration techniques exists for improving the performance of asymmetric cryptography. However, very little attention has been focused on the energy efficiency. Therefore, in this dissertation, I explore the design space thoroughly, evaluating proposed hardware acceleration techniques in terms of energy cost and showing how effective they are at reducing the energy per cryptographic operation. To do so, I estimate the energy consumption for six different hardware/software configurations across five levels of security, including both GF(p) and GF(2^m) computation. First, we design and evaluate an efficient baseline architecture for pure software-based cryptography, which is centered around a pipelined RISC processor with 256KB of program ROM and 16KB of RAM. Then, we augment our processor design with simple, yet beneficial instruction set extensions for GF(p) computation and evaluate the improvement in terms of energy per cryptographic operation compared to the baseline microarchitecture. While examining the energy breakdown of the system, it became clear that fetching instructions from program memory was contributing significantly to the overall energy consumption. Thus, we implement a parameterizable instruction cache and simulate various configurations. We determine that for our working set, the energy-optimal instruction cache is 4KB, providing a 25% energy improvement over the baseline architecture for a 192-bit key-size. Next, we introduce a reconfigurable GF(p) accelerator to our microarchitecture and mea sure the energy per operation against the baseline and the ISA extensions. For ISA extensions, we show between 1.32 and 1.45 factor improvement in energy efficiency over baseline, while for full acceleration we demonstrate a 5.17 to 6.34 factor improvement. Continuing towards greater efficiency, we investigate the energy efficiency of different arithmetic by first adding GF(2^m) instruction set extensions to our processor architecture and comparing them to their GF(p) counterpart. Finally, we design a non-configurable 163-bit GF(2^m) accelerator and perform some initial energy estimates, comparing them with our prior work. In the end, we discuss our ongoing research and make suggestions for future work. The work presented here, along with proposed future work, will aid in bringing asymmetric cryptography within reach of ultra-low energy devices
    corecore