Fast in-place accumulation

Dumas, Jean-Guillaume; Grenet, Bruno

research articlejournal article

Fast in-place accumulation

Authors: Jean-Guillaume Dumas
Bruno Grenet
Publication date: 1 January 2026
Publisher: Elsevier
Doi

Abstract

International audienceThis paper deals with simultaneously fast and in-place algorithms for formulae where the result has to be linearly accumulated: some output variables are also input variables, linked by a linear dependency. Fundamental examples include the in-place accumulated multiplication of polynomials or matrices, C += AB (that is with only O(1) extra space). The difficulty is to combine in-place computations with fast algorithms: those usually come at the expense of (potentially large) extra temporary space, but with accumulation the output variables are not even available to store intermediate values. We first propose a novel automatic design of fast and in-place accumulating algorithms for any bilinear formulae (and thus for polynomial and matrix multiplication) and then extend it to any linear accumulation of a collection of functions. For this, we relax the in-place model to any algorithm allowed to modify its inputs, provided that those are restored to their initial state afterwards. This allows us to ultimately derive unprecedented in-place accumulating algorithms for fast polynomial multiplications and for Strassen-like matrix multiplications.We then consider the simultaneously fast and in-place computation of the Euclidean polynomial modular remainder R(X) ≡ A(X) mod B(X). Fast algorithms for this usually also come at the expense of a linear amount of extra temporary space. In particular, they require one to first compute and store the whole quotient Q(X) such that A = BQ+R. We here propose an *in-place* algorithm to compute the remainder only. If A and B have respective degree m+n and n, and M(k) denotes the complexity of a (not-in-place) algorithm to multiply two degree-k polynomials, our algorithm uses at most O((n/m) M(m) log(m)) arithmetic operations. In this particular case this is a factor log(n) more than the not-in-place algorithm. But if M(n) = Θ(n^{1+ε}) for some ε>0, then our algorithms do match the not-in-place complexity bound of O((n/m) M(m)). We also propose variants that compute – still in-place and with the same kind of complexity bounds – the over-place remainder A(X) ≡ A(X) mod B(X), the accumulated remainder R(X) += A(X) mod B(X) and the accumulated modular multiplication R(X) += A(X)C(X) mod B(X), that is multiplication in a polynomial extension of a finite field.To achieve this, we develop techniques for Toeplitz matrix operations, for generalized convolutions, short product and power series division and remainder whose output is also part of the input

Similar works

Full text

Hal - Université Grenoble Alpes

oai:HAL:hal-05000159v2

Last time updated on 08/11/2025

This paper was published in Hal - Université Grenoble Alpes.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: info:eu-repo/semantics/OpenAccess