51,857 research outputs found

    Faster truncated integer multiplication

    Full text link
    We present new algorithms for computing the low n bits or the high n bits of the product of two n-bit integers. We show that these problems may be solved in asymptotically 75% of the time required to compute the full 2n-bit product, assuming that the underlying integer multiplication algorithm relies on computing cyclic convolutions of real sequences.Comment: 28 page

    Even faster integer multiplication

    Full text link
    We give a new proof of F\"urer's bound for the cost of multiplying n-bit integers in the bit complexity model. Unlike F\"urer, our method does not require constructing special coefficient rings with "fast" roots of unity. Moreover, we prove the more explicit bound O(n log n K^(log^* n))$ with K = 8. We show that an optimised variant of F\"urer's algorithm achieves only K = 16, suggesting that the new algorithm is faster than F\"urer's by a factor of 2^(log^* n). Assuming standard conjectures about the distribution of Mersenne primes, we give yet another algorithm that achieves K = 4

    Division and conquer

    Get PDF
    Integer division is an important arithmetic operation on microprocessors. To derive integer division algorithms we present an unconvential approach: a derivation technique in a calculational style, that guarantees that the derived algorithms are correct. Four different algorithms are derived using this method: restoring division, non-restoring divsion, radix-4 division and division by multiplication. We translate these to descriptions into combinatorial circuits, expressed in Verilog code. Then the circuits are compiled on a Spartan-3 Generation FPGA. At the end, we compare the propagation delays and area requirements for these circuits. We show that the division by multiplication is much faster than the other methods, however it only works for 18 bit integers. Integer division is an important arithmetic operation on microprocessors. To derive integer division algorithms we present an unconvential approach: a derivation technique in a calculational style, that guarantees that the derived algorithms are correct. Four different algorithms are derived using this method: restoring division, non-restoring divsion, radix-4 division and division by multiplication. We translate these to descriptions into combinatorial circuits, expressed in Verilog code. Then the circuits are compiled on a Spartan-3 Generation FPGA. At the end, we compare the propagation delays and area requirements for these circuits. We show that the division by multiplication is much faster than the other methods, however it only works for 18 bit integers

    Easy scalar decompositions for efficient scalar multiplication on elliptic curves and genus 2 Jacobians

    Get PDF
    The first step in elliptic curve scalar multiplication algorithms based on scalar decompositions using efficient endomorphisms-including Gallant-Lambert-Vanstone (GLV) and Galbraith-Lin-Scott (GLS) multiplication, as well as higher-dimensional and higher-genus constructions-is to produce a short basis of a certain integer lattice involving the eigenvalues of the endomorphisms. The shorter the basis vectors, the shorter the decomposed scalar coefficients, and the faster the resulting scalar multiplication. Typically, knowledge of the eigenvalues allows us to write down a long basis, which we then reduce using the Euclidean algorithm, Gauss reduction, LLL, or even a more specialized algorithm. In this work, we use elementary facts about quadratic rings to immediately write down a short basis of the lattice for the GLV, GLS, GLV+GLS, and Q-curve constructions on elliptic curves, and for genus 2 real multiplication constructions. We do not pretend that this represents a significant optimization in scalar multiplication, since the lattice reduction step is always an offline precomputation---but it does give a better insight into the structure of scalar decompositions. In any case, it is always more convenient to use a ready-made short basis than it is to compute a new one

    DGEMM on Integer Matrix Multiplication Unit

    Full text link
    Deep learning hardware achieves high throughput and low power consumption by reducing computing precision and specializing in matrix multiplication. For machine learning inference, fixed-point value computation is commonplace, where the input and output values and the model parameters are quantized. Thus, many processors are now equipped with fast integer matrix multiplication units (IMMU). It is of significant interest to find a way to harness these IMMUs to improve the performance of HPC applications while maintaining accuracy. We focus on the Ozaki scheme, which computes a high-precision matrix multiplication by using lower-precision computing units, and show the advantages and disadvantages of using IMMU. The experiment using integer Tensor Cores shows that we can compute double-precision matrix multiplication faster than cuBLAS and an existing Ozaki scheme implementation on FP16 Tensor Cores on NVIDIA consumer GPUs. Furthermore, we demonstrate accelerating a quantum circuit simulation by up to 4.33 while maintaining the FP64 accuracy

    On Nondeterministic Derandomization of Freivalds\u27 Algorithm: Consequences, Avenues and Algorithmic Progress

    Get PDF
    Motivated by studying the power of randomness, certifying algorithms and barriers for fine-grained reductions, we investigate the question whether the multiplication of two n x n matrices can be performed in near-optimal nondeterministic time O~(n^2). Since a classic algorithm due to Freivalds verifies correctness of matrix products probabilistically in time O(n^2), our question is a relaxation of the open problem of derandomizing Freivalds\u27 algorithm. We discuss consequences of a positive or negative resolution of this problem and provide potential avenues towards resolving it. Particularly, we show that sufficiently fast deterministic verifiers for 3SUM or univariate polynomial identity testing yield faster deterministic verifiers for matrix multiplication. Furthermore, we present the partial algorithmic progress that distinguishing whether an integer matrix product is correct or contains between 1 and n erroneous entries can be performed in time O~(n^2) - interestingly, the difficult case of deterministic matrix product verification is not a problem of "finding a needle in the haystack", but rather cancellation effects in the presence of many errors. Our main technical contribution is a deterministic algorithm that corrects an integer matrix product containing at most t errors in time O~(sqrt{t} n^2 + t^2). To obtain this result, we show how to compute an integer matrix product with at most t nonzeroes in the same running time. This improves upon known deterministic output-sensitive integer matrix multiplication algorithms for t = Omega(n^{2/3}) nonzeroes, which is of independent interest
    • …
    corecore