Search CORE

69 research outputs found

Fast matrix multiplication techniques based on the Adleman-Lipton model

Author: Nayebi Aran
Publication venue: 'Academic Journals'
Publication date: 18/12/2011
Field of study

On distributed memory electronic computers, the implementation and association of fast parallel matrix multiplication algorithms has yielded astounding results and insights. In this discourse, we use the tools of molecular biology to demonstrate the theoretical encoding of Strassen's fast matrix multiplication algorithm with DNA based on an

n

-moduli set in the residue number system, thereby demonstrating the viability of computational mathematics with DNA. As a result, a general scalable implementation of this model in the DNA computing paradigm is presented and can be generalized to the application of \emph{all} fast matrix multiplication algorithms on a DNA computer. We also discuss the practical capabilities and issues of this scalable implementation. Fast methods of matrix computations with DNA are important because they also allow for the efficient implementation of other algorithms (i.e. inversion, computing determinants, and graph theory) with DNA.Comment: To appear in the International Journal of Computer Engineering Research. Minor changes made to make the preprint as similar as possible to the published versio

arXiv.org e-Print Archive

Crossref

Recommended from our members

Parallel programming on General Block Min Max Criterion

Author: Lee ChuanChe
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2006
Field of study

The purpose of the thesis is to develop a parallel implementation of the General Block Min Max Criterion (GBMM). This thesis deals with two kinds of parallel overheads: Redundant Calculations Parallel Overhead (RCPO) and Communication Parallel Overhead (CPO)

CSUSB ScholarWorks

Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures

Author: Chou C.-C.
Deng Y.-F.
Li G.
Wang Y.
Publication venue: Published by Elsevier Ltd.
Publication date: 01/07/1995
Field of study

AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectures based on Strassen's method. Our timing tests, performed on a 56-node Intel Paragon, demonstrate the realization of the potential of the Strassen's method with a complexity of 4.7 M2.807 at the system level rather than the node level at which several earlier works have been focused. The parallel efficiency is nearly perfect when the processor number is the power of 7. The parallelized Strassen's method seems always faster than the traditional matrix multiplication methods whose complexity is 2M3 coupled with the BMR method and the Ring method at the system level. The speed gain depends on matrix order M: 20% for M ≈ 1000 and more than 100% for M ≈ 5000

Elsevier - Publisher Connector

Deakin Research Online

Choosing a Better Algorithm for Matrix Multiplication

Author: Zhang Xing
Publication venue: 'Oklahoma State University Library'
Publication date: 01/07/2000
Field of study

Matrix multiplication is a basic operation of linear algebra, and has numerous applications to the theory and practice of computation. Many applications can be solved fast if the algorithm of matrix multiplication is fast because it is a substantial part of these applications. This thesis conducts the study of three algorithms; the straightforward algorithm, Winograd's algorithm, Strassen's algorithm, their time complexities, and compares the three algorithms using graphs. The thesis also briefly describes two asymptotic improvements: Pan's of 1983 and Strassen's of 1986

SHAREOK repository

A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1995
Field of study

Crossref

Fast multiplication of multiple-precision integers

Author: Benz Sonja
Publication venue: RIT Scholar Works
Publication date: 01/01/1991
Field of study

Multiple-precision multiplication algorithms are of fundamental interest for both theoretical and practical reasons. The conventional method requires 0(n2) bit operations whereas the fastest known multiplication algorithm is of order 0(n log n log log n). The price that has to be paid for the increase in speed is a much more sophisticated theory and programming code. This work presents an extensive study of the best known multiple-precision multiplication algorithms. Different algorithms are implemented in C, their performance is analyzed in detail and compared to each other. The break even points, which are essential for the selection of the fastest algorithm for a particular task, are determined for a given hardware software combination

RIT Scholar Works

Effective Implementation of DGEMM on Modern Multicore CPU

Author: Gepner Pawel
Gamayunov Victor
Fraser David L.
Publication venue: Published by Elsevier B.V.
Publication date: 10/02/1975
Field of study

AbstractIn this paper we will present a detailed study on tuning double-precision matrix-matrix multiplication (DGEMM) on the Intel Xeon E5-2680 CPU. We selected an optimal algorithm from the instruction set perspective as well software tools optimized for Intel Advance Vector Extensions (AVX). Our optimizations included the use of vector memory operations, and AVX instructions. Our proposed algorithm achieves a performance improvement of 33% compared to the latest results achieved using the Intel Math Kernel Library DGEMM subroutine

Elsevier - Publisher Connector

Crossref

Repositori Obert de Coneixement de l'Ajuntament de Barcelona

Randomized word-parallel algorithms for detection of small induced subgraphs

Author: Larsson David
Tokarchuk Antonina
Publication venue: Lunds universitet/Institutionen för datavetenskap
Publication date: 01/01/2015
Field of study

Induced subgraph detection is a widely studied set of problems in theoretical computer science, with applications in e.g. social networks, molecular biology and other domains that use graph representations. Our focus lies on practical comparison of some well-known deterministic algorithms to recent Monte Carlo algorithms for detecting subgraphs on three and four vertices. For algorithms that involve operations with adjacency matrices, we study the gain of applying word parallelism, i.e. exploiting the parallel nature of common processor operations such as bitwise conjunction and disjunction. We present results of empirical running times for our implementations of the algorithms. Our results reveal insights as to when the Monte Carlo algorithms trump their deterministic counterparts and also include statistically significant improvements of several algorithms when applying word parallelism

Fast and Memory Efficient Strassen’s Matrix Multiplication on GPU Cluster

Author: Gopala Krishnan Arjun
Publication venue
Publication date: 30/08/2021
Field of study

Prior implementations of Strassen's matrix multiplication algorithm on GPUs traded additional workspace in the form of global memory or registers for time. Although Strassen's algorithm offers a reduction in computational complexity as compared to the classical algorithm, the memory overhead associated with the algorithm limits its practical utility. While there were past attempts at reducing the memory footprint of Strassen's algorithm by compromising parallelism, no prior implementation, to our knowledge, was able to hide the workspace requirement successfully. This thesis presents an implementation of Strassen's matrix multiplication in CUDA, titled Multi-Stage Memory Efficient Strassen (MSMES), that eliminates additional workspace requirements by reusing and recovering input matrices. MSMES organizes the steps involved in Strassen's algorithm into five stages where multiple steps in the same stage can be executed in parallel. Two additional stages are also discussed in the thesis that allows the recovery of the input matrices. Unlike previous works, MSMES has no additional memory requirements irrespective of the level of recursion of Strassen's algorithm. Experiments performed with MSMES (with the recovery stages) on NVIDIA Tesla V100 GPU and NVIDIA GTX 1660ti GPU yielded higher compute performance and lower memory requirements as compared to the NVIDIA library function for double precision matrix multiplication, cublasDgemm. In the multi-GPU adaptation of matrix multiplication, we explore the performance of a Strassen-based and a tile-based global decomposition scheme. We also checked the performance of using MSMES and cublasDgemm for performing local matrix multiplication with each of the global decomposition schemes. From the experiments, it was identified that the combination of using Strassen-Winograd decomposition with MSMES yielded the highest speedup among all the tested combinations

Concordia University Research Repository