6 research outputs found

    A Randomized Sublinear Time Parallel GCD Algorithm for the EREW PRAM

    Get PDF
    We present a randomized parallel algorithm that computes the greatest common divisor of two integers of n bits in length with probability 1-o(1) that takes O(n loglog n / log n) expected time using n^{6+\epsilon} processors on the EREW PRAM parallel model of computation. We believe this to be the first randomized sublinear time algorithm on the EREW PRAM for this problem

    Low-latency Hardware Architecture for VDF Evaluation in Class Groups

    Get PDF
    The verifiable delay function (VDF), as a kind of cryptographic primitives, has recently been adopted quite often in decentralized systems. Highly correlated to the security of VDFs, the fastest implementation for VDF evaluation is generally desired to be publicly known. In this paper, for the first time, we propose a low-latency hardware implementation for the complete VDF evaluation in the class group by joint exploiting optimizations. On one side, we reduce the required computational cycles by decreasing the hardware-unfriendly divisions and increase the parallelism of computations by reducing the data dependency. On the other side, well-optimized low-latency architectures for large-number divisions, multiplications, and additions are developed, respectively, while those operations are generally very hard to be accelerated. Based on these basic operators, we devise the architecture for the complete VDF evaluation with possibly minimal pipeline stalls. Finally, the proposed design is coded and synthesized under the TSMC 28-nm CMOS technology. The experimental results show that our design can achieve a speedup of 3.6x compared to the optimal C++ implementation for the VDF evaluation over an advanced CPU. Moreover, compared to the state-of-the-art hardware implementation for the squaring, a key step of VDF, we achieve about 2x speedup

    Implementation and Evaluation of Algorithmic Skeletons: Parallelisation of Computer Algebra Algorithms

    Get PDF
    This thesis presents design and implementation approaches for the parallel algorithms of computer algebra. We use algorithmic skeletons and also further approaches, like data parallel arithmetic and actors. We have implemented skeletons for divide and conquer algorithms and some special parallel loops, that we call ‘repeated computation with a possibility of premature termination’. We introduce in this thesis a rational data parallel arithmetic. We focus on parallel symbolic computation algorithms, for these algorithms our arithmetic provides a generic parallelisation approach. The implementation is carried out in Eden, a parallel functional programming language based on Haskell. This choice enables us to encode both the skeletons and the programs in the same language. Moreover, it allows us to refrain from using two different languages—one for the implementation and one for the interface—for our implementation of computer algebra algorithms. Further, this thesis presents methods for evaluation and estimation of parallel execution times. We partition the parallel execution time into two components. One of them accounts for the quality of the parallelisation, we call it the ‘parallel penalty’. The other is the sequential execution time. For the estimation, we predict both components separately, using statistical methods. This enables very confident estimations, although using drastically less measurement points than other methods. We have applied both our evaluation and estimation approaches to the parallel programs presented in this thesis. We haven also used existing estimation methods. We developed divide and conquer skeletons for the implementation of fast parallel multiplication. We have implemented the Karatsuba algorithm, Strassen’s matrix multiplication algorithm and the fast Fourier transform. The latter was used to implement polynomial convolution that leads to a further fast multiplication algorithm. Specially for our implementation of Strassen algorithm we have designed and implemented a divide and conquer skeleton basing on actors. We have implemented the parallel fast Fourier transform, and not only did we use new divide and conquer skeletons, but also developed a map-and-transpose skeleton. It enables good parallelisation of the Fourier transform. The parallelisation of Karatsuba multiplication shows a very good performance. We have analysed the parallel penalty of our programs and compared it to the serial fraction—an approach, known from literature. We also performed execution time estimations of our divide and conquer programs. This thesis presents a parallel map+reduce skeleton scheme. It allows us to combine the usual parallel map skeletons, like parMap, farm, workpool, with a premature termination property. We use this to implement the so-called ‘parallel repeated computation’, a special form of a speculative parallel loop. We have implemented two probabilistic primality tests: the Rabin–Miller test and the Jacobi sum test. We parallelised both with our approach. We analysed the task distribution and stated the fitting configurations of the Jacobi sum test. We have shown formally that the Jacobi sum test can be implemented in parallel. Subsequently, we parallelised it, analysed the load balancing issues, and produced an optimisation. The latter enabled a good implementation, as verified using the parallel penalty. We have also estimated the performance of the tests for further input sizes and numbers of processing elements. Parallelisation of the Jacobi sum test and our generic parallelisation scheme for the repeated computation is our original contribution. The data parallel arithmetic was defined not only for integers, which is already known, but also for rationals. We handled the common factors of the numerator or denominator of the fraction with the modulus in a novel manner. This is required to obtain a true multiple-residue arithmetic, a novel result of our research. Using these mathematical advances, we have parallelised the determinant computation using the Gauß elimination. As always, we have performed task distribution analysis and estimation of the parallel execution time of our implementation. A similar computation in Maple emphasised the potential of our approach. Data parallel arithmetic enables parallelisation of entire classes of computer algebra algorithms. Summarising, this thesis presents and thoroughly evaluates new and existing design decisions for high-level parallelisations of computer algebra algorithms

    A parallel extended GCD algorithm

    Get PDF
    A new parallel extended GCD algorithm is proposed. It matches the best existing parallel integer GCD algorithms of Sorenson and Chor and Goldreich, since it can be achieved in Oɛ(n / log n) time using at most n 1+ɛ processors on CRCW PRAM. Sorenson and Chor and Goldreich both use a modular approach which consider the least significant bits. By contrast, our algorithm only deals with the leading bits of the integers u and v, with u � v. This approach is more suitable for extended GCD algorithms since the coefficients of the extended version a and b, such that au + bv = gcd(u, v), are deeply linked with the order of magnitude of the rational v/u and its continuants. Consequently, the computation of such coefficients is much easier

    A Parallel Extended GCD Algorithm

    No full text
    A new parallel extended GCD algorithm is proposed. It matches the best existing parallel integer GCD algorithms of Sorenson and Chor and Goldreich, since it can be achieved in Oɛ(n / log n) time using at most n 1+ɛ processors on CRCW PRAM. Sorenson and Chor and Goldreich both use a modular approach which consider the least significant bits. By contrast, our algorithm only deals with the leading bits of the integers u and v, with u ≥ v. This approach is more suitable for extended GCD algorithms since the coefficients of the extended version a and b, such that au + bv = gcd(u, v), are deeply linked with the order of magnitude of the rational v/u and its continuants. Consequently, the computation of such coefficients is much easier
    corecore