Analysis of KECCAK Tree Hashing on GPU Architectures

Abstract

In an effort to provide security and data integrity, hashing algorithms have been designed to consume an input of any length to produce a fixed length output. KECCAK was selected by NIST to become the next Secure Hashing Algorithm SHA-3) after nearly five years of competition. In addition to providing a sequential operating mode, there is also a tree mode that allows large input messages to be hashed in parallel. This thesis focuses on the exploration and analysis of the KECCAK tree hashing mode on a GPU platform. Based on the implementation, there are core features of the GPU that could be used to accelerate the time it takes to complete a hash due to the massively parallel architecture of the device. In addition to analyzing the speed of the algorithm, the underlying hardware is profiled to identify the bottlenecks that limited the speed. The results of this work show that tree hashing can hash data at rates of up to 3 GB/s for the fixed size tree mode. On a 3.40 GHz CPU, this is the equivalent of 1.03 cycles per byte, more than six times faster than a sequential implementation for a very large input. For the variable size tree mode, the throughput was 500 MB/s. Based on the performance analysis, modification of the input rate of the KECCAK sponge resulted in a negligible change to the overall speed. As a result of the hardware profiling, the register and L1 cache usage in the GPU was a major bottleneck to the overall throughput. In a simulated GPU environment, it was shown that increasing the L1 cache by 25 percent could increase the throughput by up to 30 percent for a small tree and 15 percent for a tree that will achieve the greatest throughput on a real GPU. When this modification is combined with an increase of the L2 cache, performance can be improved by up to 20 percent

    Similar works