International Association for Cryptologic Research (IACR)
Abstract
In this paper there are considered several approaches for the increasing performance of software implementation of integer multiplication algorithm for the 32-bit & 64-bit platforms via parallelization. The main idea of algorithm parallelization consists in delayed carry mechanism using which authors have proposed earlier [11]. The delayed carry allows to get rid of connectivity in loop iterations for sums accumulation of products, which allows parallel execution of loops iterations in separate threads. Upon completion of sum accumulation threads, it is necessary to make corrections in final result via assimilation of carries. First approach consists in optimization of parallelization for the two execution threads and second approach is an evolution of the first approach and is oriented on three and more execution threads. Proposed approaches for parallelization allow increasing the total algorithm computational complexity, as for one execution thread, but decrease total execution time on multi-core CPU