193 research outputs found
Adaptable Security in Wireless Sensor Networks by Using Reconfigurable ECC Hardware Coprocessors
Specific features of Wireless Sensor Networks (WSNs) like the open accessibility to nodes, or the easy observability of radio communications, lead to severe security challenges. The application of traditional security schemes on sensor nodes is limited due to the restricted computation capability, low-power availability, and the inherent low data rate. In order to avoid dependencies on a compromised level of security, a WSN node with a microcontroller and a Field Programmable Gate Array (FPGA) is used along this work to implement a state-of-the art solution based on ECC (Elliptic Curve Cryptography). In this paper it is described how the reconfiguration possibilities of the system can be used to adapt ECC parameters in order to increase or reduce the security level depending on the application scenario or the energy budget. Two setups have been created to compare the software- and hardware-supported approaches. According to the results, the FPGA-based ECC implementation requires three orders of magnitude less energy, compared with a low power microcontroller implementation, even considering the power consumption overhead introduced by the hardware reconfiguratio
Prime Field ECDSA Signature Processing for Reconfigurable Embedded Systems
Growing ubiquity and safety relevance of embedded
systems strengthen the need to protect their functionality against
malicious attacks. Communication and system authentication
by digital signature schemes is a major issue in securing such
systems. This contribution presents a complete ECDSA signature
processing system over prime fields for bit lengths of up to 256
on reconfigurable hardware. By using dedicated hardware implementation,
the performance can be improved by up to two orders
of magnitude compared to microcontroller implementations. The
flexible system is tailored to serve as an autonomous subsystem
providing authentication transparent for any application. Integration
into a vehicle-to-vehicle communication system is shown
as an application example
Under Quantum Computer Attack: Is Rainbow a Replacement of RSA and Elliptic Curves on Hardware?
Among cryptographic systems, multivariate signature is one of the most popular candidates since it has the potential to resist quantum computer attacks. Rainbow belongs to the multivariate signature, which can be viewed as a multilayer unbalanced Oil-Vinegar system. In this paper, we present techniques to exploit Rainbow signature on hardware meeting the requirements of efficient high-performance applications. We propose a general architecture for efficient hardware implementations of Rainbow and enhance our design in three directions. First, we present a fast inversion based on binary trees. Second, we present an efficient multiplication based on compact construction in composite fields. Third, we present a parallel solving system of linear equations based on Gauss-Jordan elimination. Via further other minor optimizations and by integrating the major improvement above, we implement our design in composite fields on standard cell CMOS Application Specific Integrated Circuits (ASICs). The experimental results show that our implementation takes 4.9 us and 242 clock cycles to generate a Rainbow signature with the frequency of 50 MHz. Comparison results show that our design is more efficient than the RSA and ECC implementations
Exploring Parallelism to Improve the Performance of FrodoKEM in Hardware
FrodoKEM is a lattice-based key encapsulation mechanism, currently a semi-finalist in NIST’s post-quantum standardisation effort. A condition for these candidates is to use NIST standards for sources of randomness (i.e. seed-expanding), and as such most candidates utilise SHAKE, an XOF defined in the SHA-3 standard. However, for many of the candidates, this module is a significant implementation bottleneck. Trivium is a lightweight, ISO standard stream cipher which performs well in hardware and has been used in previous hardware designs for lattice-based cryptography. This research proposes optimised designs for FrodoKEM, concentrating on high throughput by parallelising the matrix multiplication operations within the cryptographic scheme. This process is eased by the use of Trivium due to its higher throughput and lower area consumption. The parallelisations proposed also complement the addition of first-order masking to the decapsulation module. Overall, we significantly increase the throughput of FrodoKEM; for encapsulation we see a 16 × speed-up, achieving 825 operations per second, and for decapsulation we see a 14 × speed-up, achieving 763 operations per second, compared to the previous state of the art, whilst also maintaining a similar FPGA area footprint of less than 2000 slices.</p
Implementation of ECC on FPGA using Scalable Architecture With equal Data and Key for WSN
Security of data transferred on the Wireless Sensor Network is of vital importance. In public
key cryptography RSA algorithm has been used for a long time, but it does not meet the constraints of
WSNs. Elliptic Curve Cryptography(ECC) has been employed recently because of its highest security for
same length bit. ECC point multiplication operation is time consuming which affects the speed of
encryption and decryption of data. Security in WSNs is addressed in our work, where a modified ECC is
designed by performing the point multiplication using Montgomery multiplication technique that
achieves considerable speed and with reduced area utilization. The ECC is first simulated on different
FPGA devices, with key length 11, 112, 131 and 163 bits and the area-speed tradeoff is compared. ECC
algorithm is implemented with software and hardware choosing Artix 7 XC7a100t-3csg324 FPGA which
supports key lengths of 11, 112, 131 and 163 bits. When implemented on a Artix 7 FPGA, it completes 163
bit data encryption operation over GF(2163 ) in 1ms with the maximum frequency of 229MHz.
The ECC algorithm is reconfigurable with low level to high level security with different bit key sizes.
The proposed ECC algorithm modeled using VHDL and synthesized on Spartan 3 and 6, Virtex 4, 5 and
6 and Artix7 before the hardware implementation on Atrix 7. The design satisfies the needs of resource
constrained devices by decreasing the encryption and decryption time to 1 ms with equal keylength and
datasize, while device utilization is within 13%
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
Multi-LSTM Acceleration and CNN Fault Tolerance
This thesis addresses the following two problems related to the field of Machine Learning: the acceleration of multiple Long Short Term Memory (LSTM) models on FPGAs and the fault tolerance of compressed Convolutional Neural Networks (CNN). LSTMs represent an effective solution to capture long-term dependencies in sequential data, like sentences in Natural Language Processing applications, video frames in Scene Labeling tasks or temporal series in Time Series Forecasting. In order to further boost their efficacy, especially in presence of long sequences, multiple LSTM models are utilized in a Hierarchical and Stacked fashion. However, because of their memory-bounded nature, efficient mapping of multiple LSTMs on a computing device becomes even more challenging. The first part of this thesis addresses the problem of mapping multiple LSTM models to a FPGA device by introducing a framework that modifies their memory requirements according to the target architecture. For the similar accuracy loss, the proposed framework maps multiple LSTMs with a performance improvement of 3x to 5x over state-of-the-art approaches. In the second part of this thesis, we investigate the fault tolerance of CNNs, another effective deep learning architecture. CNNs represent a dominating solution in image classification tasks, but suffer from a high performance cost, due to their computational structure. In fact, due to their large parameter space, fetching their data from main memory typically becomes a performance bottleneck. In order to tackle the problem, various techniques for their parameters compression have been developed, such as weight pruning, weight clustering and weight quantization. However, reducing the memory footprint of an application can lead to its data becoming more sensitive to faults. For this thesis work, we have conducted an analysis to verify the conditions for applying OddECC, a mechanism that supports variable strength and size ECCs for different memory regions. Our experiments reveal that compressed CNNs, which have their memory footprint reduced up to 86.3x by utilizing the aforementioned compression schemes, exhibit accuracy drops up to 13.56% in presence of random single bit faults
Low Power Montgomery Modular Multiplication on Reconfigurable Systems
This paper presents an area-optimized FPGA architecture of the Montgomery modular multiplication algorithm on a low power reconfigurable IGLOO® 2 FPGA of Microsemi®. Our contributions consist of the mapping of the Montgomery algorithm to the specific architecture of the target FPGA, using the pipelined Math blocks and the embedded memory blocks. We minimize the occupation of these blocks as well as the usage of the regular FPGA cells (LUT4 and Flip Flops) through an dedicated scheduling algorithm. The obtained results suggest that a 224-bit modular multiplication can be computed in 2.42 µs, at a cost of 444 LUT4, 160 Flip Flops, 1 Math Block and 1 64x18 RAM, with a power consumption of 25.35 mW. If more area resources are considered, modular multiplication can be performed in 1.30 µs at a cost of 658 LUT4, 268 Flip Flops, 2 Math Blocks, 2 64x18 RAMs and a power consumption of 36.02 mW
- …