12 research outputs found

    Accelerating NTRUEncrypt for in-browser cryptography utilising graphical processing units and WebGL

    Get PDF
    One of the challenges encryption faces is it is computationally intensive and therefore slow, it is vital to find faster methods to accelerate modern encryption algorithms to keep performance high whilst also preserving information security. Users often do not want to wait for applications to become responsive, applications on limited devices such as mobiles often compromise security in order to keep execution times quick. Often they use algorithms and key sizes which are not considered cryptographically secure in order to maintain a smooth user experience. Emerging approaches have begun using a devices Graphics Processing Unit (GPU) to offload some of the computational burden from the Central Processing Unit (CPU) in an effort to parallelize and accelerate the encryption algorithms. Programming for a GPU often involves the use of CUDA or OpenCL programming, however these approaches are platform dependant. This research focuses on utilizing a GPU to perform in-browser cryptography using WebGL and JavaScript. This allows any GPU-enabled device capable of launching an OpenGL compatible browser to perform GPU accelerated cryptography. A GPU based implementation of the NTRUEncrypt algorithm was created and tested against a CPU based version on a range of hardware devices with results, challenges and limitations discussed

    Sampling methods for cryptographic tests

    Get PDF
    Abstract Modern cryptographic algorithms have an enormous key diversity, so if we want to test their strength for all the keys, it will take practically an infinite time. To avoid this, we use the sampling method, in which we examine a much smaller number of keys n and then we make estimation for the total key population N with a predetermined sampling error. For the generation of the n cipher outputs (samples) with the n corresponding keys, the critical questions are how many samples we will test and how large must be the size of each sample. The general rule is that, the sampling error is reduced as we increase the number of the samples. But since the tests must be executed in an acceptable time, we must compromise the above rule with some additional factors, such as the type of the cryptographic cipher, the kind and the size of the plain information and of course the available computer power. In this study we examine the interrelations of all the above factors, and we propose applicable solutions. Keywords: Cryptography, Data encryption, Communication security, Computer security, Data security, Information security

    HETEROGENEOUS GPU&CPU CLUSTER FOR HIGH PERFORMANCE COMPUTING IN CRYPTOGRAPHY

    Get PDF
    This paper addresses issues associated with distributed computing systems andthe application of mixed GPU&CPU technology to data encryption and decryptionalgorithms. We describe a heterogenous cluster HGCC formed by twotypes of nodes: Intel processor with NVIDIA graphics processing unit and AMDprocessor with AMD graphics processing unit (formerly ATI), and a novel softwareframework that hides the heterogeneity of our cluster and provides toolsfor solving complex scientific and engineering problems. Finally, we present theresults of numerical experiments. The considered case study is concerned withparallel implementations of selected cryptanalysis algorithms. The main goal ofthe paper is to show the wide applicability of the GPU&CPU technology tolarge scale computation and data processing

    NTRU Modular Lattice Signature Scheme on CUDA GPUs

    Get PDF
    In this work we show how to use Graphics Processing Units (GPUs) with Compute Unified Device Architecture (CUDA) to accelerate a lattice based signature scheme, namely, the NTRU modular lattice signature (NTRU-MLS) scheme. Lattice based schemes require operations on large vectors that are perfect candidates for GPU implementations. In addition, similar to most lattice based signature schemes, NTRU-MLS provides transcript security with a rejection sampling technique. With a GPU implementation, we are able to generate many candidates simultaneously, and hence mitigate the performance slowdown from rejection sampling. Our implementation results show that for the original NTRU-MLS parameter sets, we obtain a 2x improvement in the signing speed; for the revised parameter sets, where acceptance rate of rejection sampling is down to around 1%, our implementation can be as much as 47x faster than a CPU implementation

    Montgomery Multiplication Using Vector Instructions

    Get PDF
    In this paper we present a parallel approach to compute interleaved Montgomery multiplication. This approach is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions. We have implemented this approach for tablet devices which run the x86 architecture (Intel Atom Z2760) using SSE2 instructions as well as devices which run on the ARM platform (Qualcomm MSM8960, NVIDIA Tegra 3 and 4) using NEON instructions. When instantiating modular exponentiation with this parallel version of Montgomery multiplication we observed a performance increase of more than a factor of 1.5 compared to the sequential implementation in OpenSSL for the classical arithmetic logic unit on the Atom platform for 2048-bit moduli

    Cofactorization on Graphics Processing Units

    Get PDF
    We show how the cofactorization step, a compute-intensive part of the relation collection phase of the number field sieve (NFS), can be farmed out to a graphics processing unit. Our implementation on a GTX 580 GPU, which is integrated with a state-of-the-art NFS implementation, can serve as a cryptanalytic co-processor for several Intel i7-3770K quad-core CPUs simultaneously. This allows those processors to focus on the memory-intensive sieving and results in more useful NFS-relations found in less time

    Doctor of Philosophy

    Get PDF
    dissertationAs the base of the software stack, system-level software is expected to provide ecient and scalable storage, communication, security and resource management functionalities. However, there are many computationally expensive functionalities at the system level, such as encryption, packet inspection, and error correction. All of these require substantial computing power. What's more, today's application workloads have entered gigabyte and terabyte scales, which demand even more computing power. To solve the rapidly increased computing power demand at the system level, this dissertation proposes using parallel graphics pro- cessing units (GPUs) in system software. GPUs excel at parallel computing, and also have a much faster development trend in parallel performance than central processing units (CPUs). However, system-level software has been originally designed to be latency-oriented. GPUs are designed for long-running computation and large-scale data processing, which are throughput-oriented. Such mismatch makes it dicult to t the system-level software with the GPUs. This dissertation presents generic principles of system-level GPU computing developed during the process of creating our two general frameworks for integrating GPU computing in storage and network packet processing. The principles are generic design techniques and abstractions to deal with common system-level GPU computing challenges. Those principles have been evaluated in concrete cases including storage and network packet processing applications that have been augmented with GPU computing. The signicant performance improvement found in the evaluation shows the eectiveness and eciency of the proposed techniques and abstractions. This dissertation also presents a literature survey of the relatively young system-level GPU computing area, to introduce the state of the art in both applications and techniques, and also their future potentials

    The Proceedings of 14th Australian Information Security Management Conference, 5-6 December 2016, Edith Cowan University, Perth, Australia

    Get PDF
    The annual Security Congress, run by the Security Research Institute at Edith Cowan University, includes the Australian Information Security and Management Conference. Now in its fourteenth year, the conference remains popular for its diverse content and mixture of technical research and discussion papers. The area of information security and management continues to be varied, as is reflected by the wide variety of subject matter covered by the papers this year. The conference has drawn interest and papers from within Australia and internationally. All submitted papers were subject to a double blind peer review process. Fifteen papers were submitted from Australia and overseas, of which ten were accepted for final presentation and publication. We wish to thank the reviewers for kindly volunteering their time and expertise in support of this event. We would also like to thank the conference committee who have organised yet another successful congress. Events such as this are impossible without the tireless efforts of such people in reviewing and editing the conference papers, and assisting with the planning, organisation and execution of the conferences. To our sponsors also a vote of thanks for both the financial and moral support provided to the conference. Finally, thank you to the administrative and technical staff, and students of the ECU Security Research Institute for their contributions to the running of the conference

    Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware

    No full text
    Graphics processing units (GPU) are increasingly being used for general purpose computing. We present implementations of large integer modular exponentiation, the core of public-key cryptosystems such as RSA, on a DirectX 10 compliant GPU. DirectX 10 compliant graphics processors are the latest generation of GPU architecture, which provide increased programming flexibility and support for integer operations. We present high performance modular exponentiation implementations based on integers represented in both standard radix form and residue number system form. We show how a GPU implementation of a 1024-bit RSA decrypt primitive can outperform a comparable CPU implementation by up to 4 times and also improve the performance of previous GPU implementations by decreasing latency by up to 7 times and doubling throughput. We present how an adaptive approach to modular exponentiation involving implementations based on both a radix and a residue number system gives the best all-around performance on the GPU both in terms of latency and throughput. We also highlight the usage criteria necessary to allow the GPU to reach peak performance on public key cryptographic operations
    corecore