Search CORE

530 research outputs found

Fast Matrix Multiplication Without Tears: A Constraint Programming Approach

Author: Deza Arnaud
Khalil Elias B.
Liu Chang
Vaezipoor Pashootan
Publication venue
Publication date: 01/06/2023
Field of study

It is known that the multiplication of an

N \times M

matrix with an

M \times P

matrix can be performed using fewer multiplications than what the naive

NMP

approach suggests. The most famous instance of this is Strassen's algorithm for multiplying two

2\times 2

matrices in 7 instead of 8 multiplications. This gives rise to the constraint satisfaction problem of fast matrix multiplication, where a set of

R < NMP

multiplication terms must be chosen and combined such that they satisfy correctness constraints on the output matrix. Despite its highly combinatorial nature, this problem has not been exhaustively examined from that perspective, as evidenced for example by the recent deep reinforcement learning approach of AlphaTensor. In this work, we propose a simple yet novel Constraint Programming approach to find non-commutative algorithms for fast matrix multiplication or provide proof of infeasibility otherwise. We propose a set of symmetry-breaking constraints and valid inequalities that are particularly helpful in proving infeasibility. On the feasible side, we find that exploiting solver performance variability in conjunction with a sparsity-based problem decomposition enables finding solutions for larger (feasible) instances of fast matrix multiplication. Our experimental results using CP Optimizer demonstrate that we can find fast matrix multiplication algorithms for matrices up to

3\times 3

in a short amount of time

arXiv.org e-Print Archive

Development and Evaluation with AI Tools & Devices: Google Edge TPU for General-Purpose Computing

Author: Sakos Chronis
Σάκος Χρόνης
Publication venue
Publication date: 14/10/2022
Field of study

DSpace at NTUA

Number theoretic techniques applied to algorithms and architectures for digital signal processing

Author: Ward Jeremy S.
Publication venue
Publication date: 01/01/1983
Field of study

Many of the techniques for the computation of a two-dimensional convolution of a small fixed window with a picture are reviewed. It is demonstrated that Winograd's cyclic convolution and Fourier Transform Algorithms, together with Nussbaumer's two-dimensional cyclic convolution algorithms, have a common general form. Many of these algorithms use the theoretical minimum number of general multiplications. A novel implementation of these algorithms is proposed which is based upon one-bit systolic arrays. These systolic arrays are networks of identical cells with each cell sharing a common control and timing function. Each cell is only connected to its nearest neighbours. These are all attractive features for implementation using Very Large Scale Integration (VLSI). The throughput rate is only limited by the time to perform a one-bit full addition. In order to assess the usefulness to these systolic arrays a 'cost function' is developed to compare them with more conventional techniques, such as the Cooley-Tukey radix-2 Fast Fourier Transform (FFT). The cost function shows that these systolic arrays offer a good way of implementing the Discrete Fourier Transform for transforms up to about 30 points in length. The cost function is a general tool and allows comparisons to be made between different implementations of the same algorithm and between dissimilar algorithms. Finally a technique is developed for the derivation of Discrete Cosine Transform (DCT) algorithms from the Winograd Fourier Transform Algorithm. These DCT algorithms may be implemented by modified versions of the systolic arrays proposed earlier, but requiring half the number of cells

Durham e-Theses

OpenGrey Repository

Evaluation of the PlayStation 2 as a cluster computing node

Author: Nigro Christopher R.
Publication venue: RIT Scholar Works
Publication date: 01/01/2004
Field of study

Cluster computing is currently a popular, cost-effective solution to the increasing computational demands of many applications in scientific computing and image processing. A cluster computer is comprised of several networked computers known as nodes. Since the goal of cluster computing is to provide a cost-effective means to processing computationally demanding applications, nodes that can be obtained at a low price with minimal performance tradeoff are always attractive. Presently, the most common cluster computers are comprised of networks of workstations constructed from commodity components. Recent trends have shown that computers being developed and deployed for purposes other than traditional personal computers or workstations have presented new candidates for cluster computing nodes. The new computing node candidates being considered may provide a competitive and even less expensive alternative to the cluster computing nodes being used today. Machines such as video game consoles, whose prices are kept extremely low due to intense marketplace competition, are a prime example of such machines. The Sony PlayStation 2, in particular, provides the user with low-level hardware devices that are often found in more expensive machines. This work presents and evaluation of the PlayStation 2 video game console as a cluster computing node for scientific and image processing applications. From this evaluation, a determination is made as to whether the PlayStation 2 is a viable alternative to the cluster computing nodes being used today

RIT Scholar Works