304,981 research outputs found

    The border support rank of two-by-two matrix multiplication is seven

    Get PDF
    We show that the border support rank of the tensor corresponding to two-by-two matrix multiplication is seven over the complex numbers. We do this by constructing two polynomials that vanish on all complex tensors with format four-by-four-by-four and border rank at most six, but that do not vanish simultaneously on any tensor with the same support as the two-by-two matrix multiplication tensor. This extends the work of Hauenstein, Ikenmeyer, and Landsberg. We also give two proofs that the support rank of the two-by-two matrix multiplication tensor is seven over any field: one proof using a result of De Groote saying that the decomposition of this tensor is unique up to sandwiching, and another proof via the substitution method. These results answer a question asked by Cohn and Umans. Studying the border support rank of the matrix multiplication tensor is relevant for the design of matrix multiplication algorithms, because upper bounds on the border support rank of the matrix multiplication tensor lead to upper bounds on the computational complexity of matrix multiplication, via a construction of Cohn and Umans. Moreover, support rank has applications in quantum communication complexity

    Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds

    Full text link
    A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multiplication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ballard, Demmel, Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale perfectly, but only up to some number of processors where the inter-processor communication no longer scales. We obtain a memory-independent communication cost lower bound on classical and Strassen-based distributed-memory matrix multiplication algorithms. These bounds imply that no classical or Strassen-based parallel matrix multiplication algorithm can strongly scale perfectly beyond the ranges already attained by the two parallel algorithms mentioned above. The memory-independent bounds and the strong scaling bounds generalize to other algorithms.Comment: 4 pages, 1 figur

    Optical implementation of systolic array processing

    Get PDF
    Algorithms for matrix vector multiplication are implemented using acousto-optic cells for multiplication and input data transfer and using charge coupled devices detector arrays for accumulation and output of the results. No two dimensional matrix mask is required; matrix changes are implemented electronically. A system for multiplying a 50 component nonnegative real vector by a 50 by 50 nonnegative real matrix is described. Modifications for bipolar real and complex valued processing are possible, as are extensions to matrix-matrix multiplication and multiplication of a vector by multiple matrices

    Fast QMC matrix-vector multiplication

    Full text link
    Quasi-Monte Carlo (QMC) rules 1/Nn=0N1f(ynA)1/N \sum_{n=0}^{N-1} f(\boldsymbol{y}_n A) can be used to approximate integrals of the form [0,1]sf(yA)dy\int_{[0,1]^s} f(\boldsymbol{y} A) \,\mathrm{d} \boldsymbol{y}, where AA is a matrix and y\boldsymbol{y} is row vector. This type of integral arises for example from the simulation of a normal distribution with a general covariance matrix, from the approximation of the expectation value of solutions of PDEs with random coefficients, or from applications from statistics. In this paper we design QMC quadrature points y0,...,yN1[0,1]s\boldsymbol{y}_0, ..., \boldsymbol{y}_{N-1} \in [0,1]^s such that for the matrix Y=(y0,...,yN1)Y = (\boldsymbol{y}_{0}^\top, ..., \boldsymbol{y}_{N-1}^\top)^\top whose rows are the quadrature points, one can use the fast Fourier transform to compute the matrix-vector product YaY \boldsymbol{a}^\top, aRs\boldsymbol{a} \in \mathbb{R}^s, in O(NlogN)\mathcal{O}(N \log N) operations and at most s1s-1 extra additions. The proposed method can be applied to lattice rules, polynomial lattice rules and a certain type of Korobov pp-set. The approach is illustrated computationally by three numerical experiments. The first test considers the generation of points with normal distribution and general covariance matrix, the second test applies QMC to high-dimensional, affine-parametric, elliptic partial differential equations with uniformly distributed random coefficients, and the third test addresses Finite-Element discretizations of elliptic partial differential equations with high-dimensional, log-normal random input data. All numerical tests show a significant speed-up of the computation times of the fast QMC matrix method compared to a conventional implementation as the dimension becomes large
    corecore