193 research outputs found

    Adaptable Security in Wireless Sensor Networks by Using Reconfigurable ECC Hardware Coprocessors

    Get PDF
    Specific features of Wireless Sensor Networks (WSNs) like the open accessibility to nodes, or the easy observability of radio communications, lead to severe security challenges. The application of traditional security schemes on sensor nodes is limited due to the restricted computation capability, low-power availability, and the inherent low data rate. In order to avoid dependencies on a compromised level of security, a WSN node with a microcontroller and a Field Programmable Gate Array (FPGA) is used along this work to implement a state-of-the art solution based on ECC (Elliptic Curve Cryptography). In this paper it is described how the reconfiguration possibilities of the system can be used to adapt ECC parameters in order to increase or reduce the security level depending on the application scenario or the energy budget. Two setups have been created to compare the software- and hardware-supported approaches. According to the results, the FPGA-based ECC implementation requires three orders of magnitude less energy, compared with a low power microcontroller implementation, even considering the power consumption overhead introduced by the hardware reconfiguratio

    Prime Field ECDSA Signature Processing for Reconfigurable Embedded Systems

    Get PDF
    Growing ubiquity and safety relevance of embedded systems strengthen the need to protect their functionality against malicious attacks. Communication and system authentication by digital signature schemes is a major issue in securing such systems. This contribution presents a complete ECDSA signature processing system over prime fields for bit lengths of up to 256 on reconfigurable hardware. By using dedicated hardware implementation, the performance can be improved by up to two orders of magnitude compared to microcontroller implementations. The flexible system is tailored to serve as an autonomous subsystem providing authentication transparent for any application. Integration into a vehicle-to-vehicle communication system is shown as an application example

    Under Quantum Computer Attack: Is Rainbow a Replacement of RSA and Elliptic Curves on Hardware?

    Get PDF
    Among cryptographic systems, multivariate signature is one of the most popular candidates since it has the potential to resist quantum computer attacks. Rainbow belongs to the multivariate signature, which can be viewed as a multilayer unbalanced Oil-Vinegar system. In this paper, we present techniques to exploit Rainbow signature on hardware meeting the requirements of efficient high-performance applications. We propose a general architecture for efficient hardware implementations of Rainbow and enhance our design in three directions. First, we present a fast inversion based on binary trees. Second, we present an efficient multiplication based on compact construction in composite fields. Third, we present a parallel solving system of linear equations based on Gauss-Jordan elimination. Via further other minor optimizations and by integrating the major improvement above, we implement our design in composite fields on standard cell CMOS Application Specific Integrated Circuits (ASICs). The experimental results show that our implementation takes 4.9 us and 242 clock cycles to generate a Rainbow signature with the frequency of 50 MHz. Comparison results show that our design is more efficient than the RSA and ECC implementations

    Exploring Parallelism to Improve the Performance of FrodoKEM in Hardware

    Get PDF
    FrodoKEM is a lattice-based key encapsulation mechanism, currently a semi-finalist in NIST’s post-quantum standardisation effort. A condition for these candidates is to use NIST standards for sources of randomness (i.e. seed-expanding), and as such most candidates utilise SHAKE, an XOF defined in the SHA-3 standard. However, for many of the candidates, this module is a significant implementation bottleneck. Trivium is a lightweight, ISO standard stream cipher which performs well in hardware and has been used in previous hardware designs for lattice-based cryptography. This research proposes optimised designs for FrodoKEM, concentrating on high throughput by parallelising the matrix multiplication operations within the cryptographic scheme. This process is eased by the use of Trivium due to its higher throughput and lower area consumption. The parallelisations proposed also complement the addition of first-order masking to the decapsulation module. Overall, we significantly increase the throughput of FrodoKEM; for encapsulation we see a 16 × speed-up, achieving 825 operations per second, and for decapsulation we see a 14 × speed-up, achieving 763 operations per second, compared to the previous state of the art, whilst also maintaining a similar FPGA area footprint of less than 2000 slices.</p

    Implementation of ECC on FPGA using Scalable Architecture With equal Data and Key for WSN

    Get PDF
    Security of data transferred on the Wireless Sensor Network is of vital importance. In public key cryptography RSA algorithm has been used for a long time, but it does not meet the constraints of WSNs. Elliptic Curve Cryptography(ECC) has been employed recently because of its highest security for same length bit. ECC point multiplication operation is time consuming which affects the speed of encryption and decryption of data. Security in WSNs is addressed in our work, where a modified ECC is designed by performing the point multiplication using Montgomery multiplication technique that achieves considerable speed and with reduced area utilization. The ECC is first simulated on different FPGA devices, with key length 11, 112, 131 and 163 bits and the area-speed tradeoff is compared. ECC algorithm is implemented with software and hardware choosing Artix 7 XC7a100t-3csg324 FPGA which supports key lengths of 11, 112, 131 and 163 bits. When implemented on a Artix 7 FPGA, it completes 163 bit data encryption operation over GF(2163 ) in 1ms with the maximum frequency of 229MHz. The ECC algorithm is reconfigurable with low level to high level security with different bit key sizes. The proposed ECC algorithm modeled using VHDL and synthesized on Spartan 3 and 6, Virtex 4, 5 and 6 and Artix7 before the hardware implementation on Atrix 7. The design satisfies the needs of resource constrained devices by decreasing the encryption and decryption time to 1 ms with equal keylength and datasize, while device utilization is within 13%

    Efficient Design and implementation of Elliptic Curve Cryptography on FPGA

    Get PDF

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

    Multi-LSTM Acceleration and CNN Fault Tolerance

    Get PDF
    This thesis addresses the following two problems related to the field of Machine Learning: the acceleration of multiple Long Short Term Memory (LSTM) models on FPGAs and the fault tolerance of compressed Convolutional Neural Networks (CNN). LSTMs represent an effective solution to capture long-term dependencies in sequential data, like sentences in Natural Language Processing applications, video frames in Scene Labeling tasks or temporal series in Time Series Forecasting. In order to further boost their efficacy, especially in presence of long sequences, multiple LSTM models are utilized in a Hierarchical and Stacked fashion. However, because of their memory-bounded nature, efficient mapping of multiple LSTMs on a computing device becomes even more challenging. The first part of this thesis addresses the problem of mapping multiple LSTM models to a FPGA device by introducing a framework that modifies their memory requirements according to the target architecture. For the similar accuracy loss, the proposed framework maps multiple LSTMs with a performance improvement of 3x to 5x over state-of-the-art approaches. In the second part of this thesis, we investigate the fault tolerance of CNNs, another effective deep learning architecture. CNNs represent a dominating solution in image classification tasks, but suffer from a high performance cost, due to their computational structure. In fact, due to their large parameter space, fetching their data from main memory typically becomes a performance bottleneck. In order to tackle the problem, various techniques for their parameters compression have been developed, such as weight pruning, weight clustering and weight quantization. However, reducing the memory footprint of an application can lead to its data becoming more sensitive to faults. For this thesis work, we have conducted an analysis to verify the conditions for applying OddECC, a mechanism that supports variable strength and size ECCs for different memory regions. Our experiments reveal that compressed CNNs, which have their memory footprint reduced up to 86.3x by utilizing the aforementioned compression schemes, exhibit accuracy drops up to 13.56% in presence of random single bit faults

    Low Power Montgomery Modular Multiplication on Reconfigurable Systems

    Get PDF
    This paper presents an area-optimized FPGA architecture of the Montgomery modular multiplication algorithm on a low power reconfigurable IGLOO® 2 FPGA of Microsemi®. Our contributions consist of the mapping of the Montgomery algorithm to the specific architecture of the target FPGA, using the pipelined Math blocks and the embedded memory blocks. We minimize the occupation of these blocks as well as the usage of the regular FPGA cells (LUT4 and Flip Flops) through an dedicated scheduling algorithm. The obtained results suggest that a 224-bit modular multiplication can be computed in 2.42 µs, at a cost of 444 LUT4, 160 Flip Flops, 1 Math Block and 1 64x18 RAM, with a power consumption of 25.35 mW. If more area resources are considered, modular multiplication can be performed in 1.30 µs at a cost of 658 LUT4, 268 Flip Flops, 2 Math Blocks, 2 64x18 RAMs and a power consumption of 36.02 mW
    • …
    corecore