127 research outputs found
Cryptanalysis of the McEliece Cryptosystem on GPGPUs
The linear code based McEliece cryptosystem is potentially promising as a so-called post-quantum public key cryptosystem because thus far it has resisted quantum cryptanalysis, but to be considered secure, the cryptosystem must resist other attacks as well. In 2011, Bernstein et al. introduced the Ball Collision Decoding (BCD) attack on McEliece which is a significant improvement in asymptotic complexity over the previous best known attack. We implement this attack on GPUs, which offer a parallel architecture that is well-suited to the matrix operations used in the attack and decrease the asymptotic run-time. Our implementation executes the attack more than twice as fast as the reference implementation and could be used for a practical attack on the original McEliece parameters
Elliptic Curve Cryptography on Modern Processor Architectures
Abstract
Elliptic Curve Cryptography (ECC) has been adopted by the US National Security Agency (NSA) in Suite "B" as part of its "Cryptographic Modernisation Program ". Additionally,
it has been favoured by an entire host of mobile devices due to its superior performance characteristics. ECC is also the building block on which the exciting field of pairing/identity based cryptography is based. This widespread use means that there is potentially a lot to be gained by researching efficient implementations on modern processors such as IBM's Cell Broadband Engine and Philip's next generation smart card cores. ECC operations can be thought of as a pyramid of building blocks, from instructions on a core, modular operations on a finite field, point addition & doubling, elliptic curve scalar
multiplication to application level protocols. In this thesis we examine an implementation of these components for ECC focusing on a range of optimising techniques for the Cell's SPU and the MIPS smart card. We show significant performance improvements that can be achieved through of adoption of EC
Efficient Computation and FPGA implementation of Fully Homomorphic Encryption with Cloud Computing Significance
Homomorphic Encryption provides unique security solution for cloud computing. It ensures not only that data in cloud have confidentiality but also that data processing by cloud server does not compromise data privacy. The Fully Homomorphic Encryption (FHE) scheme proposed by Lopez-Alt, Tromer, and Vaikuntanathan (LTV), also known as NTRU(Nth degree truncated polynomial ring) based method, is considered one of the most important FHE methods suitable for practical implementation. In this thesis, an efficient algorithm and architecture for LTV Fully Homomorphic Encryption is proposed. Conventional linear feedback shift register (LFSR) structure is expanded and modified for performing the truncated polynomial ring multiplication in LTV scheme in parallel. Novel and efficient modular multiplier, modular adder and modular subtractor are proposed to support high speed processing of LFSR operations. In addition, a family of special moduli are selected for high speed computation of modular operations. Though the area keeps the complexity of O(Nn^2) with no advantage in circuit level. The proposed architecture effectively reduces the time complexity from O(N log N) to linear time, O(N), compared to the best existing works. An FPGA implementation of the proposed architecture for LTV FHE is achieved and demonstrated. An elaborate comparison of the existing methods and the proposed work is presented, which shows the proposed work gains significant speed up over existing works
Secure Biometric Cryptosystem for Distributed System
Information (biometric) security is concerned with the assurance of confidentiality, integrity, and availability of information in all forms, biometric information is very sophisticated in terms of all, in this work we are focusing on data pattern along with all security assurance, so that we can improve the matching performance with good security assurance, here one of the most effective RSA algorithm use with biometric (fingerprint) data. Our work includes the determination of appropriate key sizes with security issues and determines the matching performance using MATLAB and JDK1.6, performance of this system is more than 86.7% and when combines this with blind authentication techniques then we get all security assurance with high performance biometric cryptosystem
A hardware-accelerated ecdlp with highperformance modular multiplication
Elliptic curve cryptography (ECC) has become a popular public key cryptography standard. The security of ECC is due to the difficulty of solving the elliptic curve discrete logarithm problem (ECDLP). In this paper, we demonstrate a successful attack on ECC over prime field using the Pollard rho algorithm implemented on a hardware-software cointegrated platform. We propose a high-performance architecture for multiplication over prime field using specialized DSP blocks in the FPGA. We characterize this architecture by exploring the design space to determine the optimal integer basis for polynomial representation and we demonstrate an efficient mapping of this design to multiple standard prime field elliptic curves. We use the resulting modular multiplier to demonstrate low-latency multiplications for curves secp112r1 and P-192. We apply our modular multiplier to implement a complete attack on secp112r1 using a Nallatech FSB-Compute platform with Virtex-5 FPGA. The measured performance of the resulting design is 114 cycles per Pollard rho step at 100 MHz, which gives 878 K iterations per second per ECC core. We extend this design to a multicore ECDLP implementation that achieves 14.05 M iterations per second with 16 parallel point addition cores
Hardware accelerated authentication system for dynamic time-critical networks
The secure and efficient operation of time-critical networks, such as vehicular networks, smart-grid and other smart-infrastructures, is of primary importance in today’s society. It is crucial to minimize the impact of security mechanisms over such networks so that the safe and reliable operations of time-critical systems are not being interfered.
Even though there are several security mechanisms, their application to smart-infrastructure and Internet of Things (IoT) deployments may not meet the ubiquitous and time-sensitive needs of these systems. That is, existing security mechanisms either introduce a significant computation and communication overhead, or they are not scalable for a large number of IoT components. In particular, as a primary authentication mechanism, existing digital signatures cannot meet the real-time processing requirements of time-critical networks, and also do not fully benefit from advancements in the underlying hardware/software of IoTs.
As a part of this thesis, we create a reliable and scalable authentication system to ensure secure and reliable operation of dynamic time-critical networks like vehicular networks through hardware acceleration. The system is implemented on System-On-Chips (SoC) leveraging the parallel processing capabilities of the embedded Graphical Processing Units (GPUs) along with the CPUs (Central Processing Units). We identify a set of cryptographic authentication mechanisms, which consist of operations that are highly parallelizable while still maintain high standards of security and are also secure against various malicious adversaries. We also focus on creating a fully functional prototype of the system which we call a “Dynamic Scheduler” which will take care of scheduling the messages for signing or verification on the basis of their priority level and the number of messages currently in the system, so as to derive maximum throughput or minimum latency from the system, whatever the requirement may be
Fast Lattice Basis Reduction Suitable for Massive Parallelization and Its Application to the Shortest Vector Problem
The hardness of the shortest vector problem for lattices is a fundamental assumption underpinning the security of many lattice-based cryptosystems, and therefore, it is important to evaluate its difficulty.
Here, recent advances in studying the hardness of problems in large-scale lattice computing have pointed to need to study the design and methodology for exploiting the performance of massive parallel computing environments.
In this paper, we propose a lattice basis reduction algorithm suitable for massive parallelization.
Our parallelization strategy is an extension of the Fukase-Kashiwabara algorithm~(J. Information Processing, Vol. 23, No. 1, 2015).
In our algorithm, given a lattice basis as input, variants of the lattice basis are generated, and then each process reduces its lattice basis; at this time, the processes cooperate and share auxiliary information with each other to accelerate lattice basis reduction.
In addition, we propose a new strategy based on our evaluation function of a lattice basis in order to decrease the sum of squared lengths of orthogonal basis vectors.
We applied our algorithm to problem instances from the SVP Challenge.
We solved a 150-dimension problem instance in about 394 days by using large clusters, and we also solved problem instances of dimensions 134, 138, 140, 142, 144, 146, and 148.
Since the previous world record is the problem of dimension 132, these results demonstrate the effectiveness of our proposal
Recommended from our members
Accelerating RSA Public Key Cryptography via Hardware Acceleration
A large number and a variety of sensors and actuators, also known as edge devices of the Internet of Things, belonging to various industries - health care monitoring, home automation, industrial automation, have become prevalent in today\u27s world. These edge devices need to communicate data collected to the central system occasionally and often in burst mode which is then used for monitoring and control purposes. To ensure secure connections, Asymmetric or Public Key Cryptography (PKC) schemes are used in combination with Symmetric Cryptography schemes. RSA (Rivest - Shamir- Adleman) is one of the most prevalent public key cryptosystems, and has computationally intensive operations which might have a high latency when implemented in resource constrained environments. The objective of this thesis is to design an accelerator capable of increasing the speed of execution of the RSA algorithm in such resource constrained environments. The bottleneck of the algorithm is determined by analyzing the performance of the algorithm in various platforms - Intel Linux Machine, Raspberry Pi, Nios soft core processor. In designing the accelerator to speedup bottleneck function, we realize that the accelerator architecture will need to be changed according to the resources available to the accelerator. We use high level synthesis tools to explore the design space of the accelerator by taking into consideration system level aspects like the number of ports available to transfer inputs to the accelerator, the word size of the processor, etc. We also propose a new accelerator architecture for the bottleneck function and the algorithm it implements and compare the area and latency requirements of it with other designs obtained from design space exploration. The functionality of the design proposed is verified and prototyped in Zynq SoC of Xilinx Zedboard
- …