784 research outputs found

    Analysis of Parallel Montgomery Multiplication in CUDA

    Get PDF
    For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

    A Survey on Homomorphic Encryption Schemes: Theory and Implementation

    Full text link
    Legacy encryption systems depend on sharing a key (public or private) among the peers involved in exchanging an encrypted message. However, this approach poses privacy concerns. Especially with popular cloud services, the control over the privacy of the sensitive data is lost. Even when the keys are not shared, the encrypted material is shared with a third party that does not necessarily need to access the content. Moreover, untrusted servers, providers, and cloud operators can keep identifying elements of users long after users end the relationship with the services. Indeed, Homomorphic Encryption (HE), a special kind of encryption scheme, can address these concerns as it allows any third party to operate on the encrypted data without decrypting it in advance. Although this extremely useful feature of the HE scheme has been known for over 30 years, the first plausible and achievable Fully Homomorphic Encryption (FHE) scheme, which allows any computable function to perform on the encrypted data, was introduced by Craig Gentry in 2009. Even though this was a major achievement, different implementations so far demonstrated that FHE still needs to be improved significantly to be practical on every platform. First, we present the basics of HE and the details of the well-known Partially Homomorphic Encryption (PHE) and Somewhat Homomorphic Encryption (SWHE), which are important pillars of achieving FHE. Then, the main FHE families, which have become the base for the other follow-up FHE schemes are presented. Furthermore, the implementations and recent improvements in Gentry-type FHE schemes are also surveyed. Finally, further research directions are discussed. This survey is intended to give a clear knowledge and foundation to researchers and practitioners interested in knowing, applying, as well as extending the state of the art HE, PHE, SWHE, and FHE systems.Comment: - Updated. (October 6, 2017) - This paper is an early draft of the survey that is being submitted to ACM CSUR and has been uploaded to arXiv for feedback from stakeholder

    Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

    Get PDF
    Asymmetric cryptographic algorithm (e.g., RSA and Elliptic Curve Cryptography) implementations on Graphics Processing Units (GPUs) have been researched for over a decade. The basic idea of most previous contributions is exploiting the highly parallel GPU architecture and porting the integer-based algorithms from general-purpose CPUs to GPUs, to offer high performance. However, the great potential cryptographic computing power of GPUs, especially by the more powerful floating-point instructions, has not been comprehensively investigated in fact. In this paper, we fully exploit the floating-point computing power of GPUs, by various designs, including the floating-point-based Montgomery multiplication/exponentiation algorithm and Chinese Remainder Theorem (CRT) implementation in GPU. And for practical usage of the proposed algorithm, a new method is performed to convert the input/output between octet strings and floating-point numbers, fully utilizing GPUs and further promoting the overall performance by about 5%. The performance of RSA-2048/3072/4096 decryption on NVIDIA GeForce GTX TITAN reaches 42,211/12,151/5,790 operations per second, respectively, which achieves 13 times the performance of the previous fastest floating-point-based implementation (published in Eurocrypt 2009). The RSA-4096 decryption precedes the existing fastest integer-based result by 23%

    Weak RSA Key Discovery on GPGPU

    Get PDF
    We address one of the weaknesses of the RSA ciphering systems \textit{i.e.} the existence of the private keys that are relatively easy to compromise by the attacker. The problem can be mitigated by the Internet services providers, but it requires some computational effort. We propose the proof of concept of the GPGPU-accelerated system that can help detect and eliminate users' weak keys.We have proposed the algorithms and developed the GPU-optimised program code that is now publicly available and substantially outperforms the tested CPU processor. The source code of the OpenSSL library was adapted for GPGPU, and the resulting code can perform both on the GPU and CPU processors. Additionally, we present the solution how to map a triangular grid into the GPU rectangular grid \textendash{} the basic dilemma in many problems that concern pair-wise analysis for the set of elements. Also, the comparison of two data caching methods on GPGPU leads to the interesting general conclusions. We present the results of the experiments of the performance analysis of the selected algorithms for the various RSA key length, configurations of GPU grid, and size of the tested key set

    Cryptanalysis of the McEliece Cryptosystem on GPGPUs

    Get PDF
    The linear code based McEliece cryptosystem is potentially promising as a so-called post-quantum public key cryptosystem because thus far it has resisted quantum cryptanalysis, but to be considered secure, the cryptosystem must resist other attacks as well. In 2011, Bernstein et al. introduced the Ball Collision Decoding (BCD) attack on McEliece which is a significant improvement in asymptotic complexity over the previous best known attack. We implement this attack on GPUs, which offer a parallel architecture that is well-suited to the matrix operations used in the attack and decrease the asymptotic run-time. Our implementation executes the attack more than twice as fast as the reference implementation and could be used for a practical attack on the original McEliece parameters

    ESTABLISHED WAYS TO ATTACK EVEN THE BEST ENCRYPTION ALGORITHM

    Get PDF
    Which solution is the best – public key or private key encryption? This question cannot have a very rigorous, logical and definitive answer, so that the matter be forever settled :). The question supposes that the two methods could be compared on completely the same indicators – well, from my point of view, the comparison is not very relevant. Encryption specialists have demonstrated that the sizes of public key encrypted messages are much bigger than the encrypted message using private key algorithms. From this point of view, we can say that private key algorithms are more efficient than their newer counterparts. Looking at the issue through the eyeglass of the security level, the public key encryption have a great advantage of the private key variants, their level of protection, in the most pessimistic scenarios, being at least 35 time higher. As a general rule, each type of algorithm has managed to find its own market niche where could be applicable as a best solution and be more efficient than the other encryption model.Encryption, decryption, key, cryptanalysis, brute-force, linear, differential, algebra

    Hardware accelerated authentication system for dynamic time-critical networks

    Get PDF
    The secure and efficient operation of time-critical networks, such as vehicular networks, smart-grid and other smart-infrastructures, is of primary importance in today’s society. It is crucial to minimize the impact of security mechanisms over such networks so that the safe and reliable operations of time-critical systems are not being interfered. Even though there are several security mechanisms, their application to smart-infrastructure and Internet of Things (IoT) deployments may not meet the ubiquitous and time-sensitive needs of these systems. That is, existing security mechanisms either introduce a significant computation and communication overhead, or they are not scalable for a large number of IoT components. In particular, as a primary authentication mechanism, existing digital signatures cannot meet the real-time processing requirements of time-critical networks, and also do not fully benefit from advancements in the underlying hardware/software of IoTs. As a part of this thesis, we create a reliable and scalable authentication system to ensure secure and reliable operation of dynamic time-critical networks like vehicular networks through hardware acceleration. The system is implemented on System-On-Chips (SoC) leveraging the parallel processing capabilities of the embedded Graphical Processing Units (GPUs) along with the CPUs (Central Processing Units). We identify a set of cryptographic authentication mechanisms, which consist of operations that are highly parallelizable while still maintain high standards of security and are also secure against various malicious adversaries. We also focus on creating a fully functional prototype of the system which we call a “Dynamic Scheduler” which will take care of scheduling the messages for signing or verification on the basis of their priority level and the number of messages currently in the system, so as to derive maximum throughput or minimum latency from the system, whatever the requirement may be
    • …
    corecore