18 research outputs found
Fast Matrix Multiplication Without Tears: A Constraint Programming Approach
It is known that the multiplication of an matrix with an matrix can be performed using fewer multiplications than what the
naive approach suggests. The most famous instance of this is Strassen's
algorithm for multiplying two matrices in 7 instead of 8
multiplications. This gives rise to the constraint satisfaction problem of fast
matrix multiplication, where a set of multiplication terms must be
chosen and combined such that they satisfy correctness constraints on the
output matrix. Despite its highly combinatorial nature, this problem has not
been exhaustively examined from that perspective, as evidenced for example by
the recent deep reinforcement learning approach of AlphaTensor. In this work,
we propose a simple yet novel Constraint Programming approach to find
non-commutative algorithms for fast matrix multiplication or provide proof of
infeasibility otherwise. We propose a set of symmetry-breaking constraints and
valid inequalities that are particularly helpful in proving infeasibility. On
the feasible side, we find that exploiting solver performance variability in
conjunction with a sparsity-based problem decomposition enables finding
solutions for larger (feasible) instances of fast matrix multiplication. Our
experimental results using CP Optimizer demonstrate that we can find fast
matrix multiplication algorithms for matrices up to in a short
amount of time
An adaptive prefix-assignment technique for symmetry reduction
This paper presents a technique for symmetry reduction that adaptively
assigns a prefix of variables in a system of constraints so that the generated
prefix-assignments are pairwise nonisomorphic under the action of the symmetry
group of the system. The technique is based on McKay's canonical extension
framework [J.~Algorithms 26 (1998), no.~2, 306--324]. Among key features of the
technique are (i) adaptability---the prefix sequence can be user-prescribed and
truncated for compatibility with the group of symmetries; (ii)
parallelizability---prefix-assignments can be processed in parallel
independently of each other; (iii) versatility---the method is applicable
whenever the group of symmetries can be concisely represented as the
automorphism group of a vertex-colored graph; and (iv) implementability---the
method can be implemented relying on a canonical labeling map for
vertex-colored graphs as the only nontrivial subroutine. To demonstrate the
practical applicability of our technique, we have prepared an experimental
open-source implementation of the technique and carry out a set of experiments
that demonstrate ability to reduce symmetry on hard instances. Furthermore, we
demonstrate that the implementation effectively parallelizes to compute
clusters with multiple nodes via a message-passing interface.Comment: Updated manuscript submitted for revie
Improving the numerical stability of fast matrix multiplication
Fast algorithms for matrix multiplication, namely those that perform
asymptotically fewer scalar operations than the classical algorithm, have been
considered primarily of theoretical interest. Apart from Strassen's original
algorithm, few fast algorithms have been efficiently implemented or used in
practical applications. However, there exist many practical alternatives to
Strassen's algorithm with varying performance and numerical properties. Fast
algorithms are known to be numerically stable, but because their error bounds
are slightly weaker than the classical algorithm, they are not used even in
cases where they provide a performance benefit.
We argue in this paper that the numerical sacrifice of fast algorithms,
particularly for the typical use cases of practical algorithms, is not
prohibitive, and we explore ways to improve the accuracy both theoretically and
empirically. The numerical accuracy of fast matrix multiplication depends on
properties of the algorithm and of the input matrices, and we consider both
contributions independently. We generalize and tighten previous error analyses
of fast algorithms and compare their properties. We discuss algorithmic
techniques for improving the error guarantees from two perspectives:
manipulating the algorithms, and reducing input anomalies by various forms of
diagonal scaling. Finally, we benchmark performance and demonstrate our
improved numerical accuracy