Search CORE

32 research outputs found

Recommended from our members

Parallel programming on General Block Min Max Criterion

Author: Lee ChuanChe
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2006
Field of study

The purpose of the thesis is to develop a parallel implementation of the General Block Min Max Criterion (GBMM). This thesis deals with two kinds of parallel overheads: Redundant Calculations Parallel Overhead (RCPO) and Communication Parallel Overhead (CPO)

CSUSB ScholarWorks

Computational Complexity and Numerical Stability of Linear Problems

Author: Holtz Olga
Shomron Noam
Publication venue: 'European Mathematical Society Publishing House'
Publication date: 01/01/2009
Field of study

We survey classical and recent developments in numerical linear algebra, focusing on two issues: computational complexity, or arithmetic costs, and numerical stability, or performance under roundoff error. We present a brief account of the algebraic complexity theory as well as the general error analysis for matrix multiplication and related problems. We emphasize the central role played by the matrix multiplication problem and discuss historical and modern approaches to its solution.Comment: 16 pages; updated to reflect referees' remarks; to appear in Proceedings of the 5th European Congress of Mathematic

arXiv.org e-Print Archive

CiteSeerX

Crossref

Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures

Author: Chou C.-C.
Deng Y.-F.
Li G.
Wang Y.
Publication venue: Published by Elsevier Ltd.
Publication date: 01/07/1995
Field of study

AbstractWe present a parallel method for matrix multiplication on distributed-memory MIMD architectures based on Strassen's method. Our timing tests, performed on a 56-node Intel Paragon, demonstrate the realization of the potential of the Strassen's method with a complexity of 4.7 M2.807 at the system level rather than the node level at which several earlier works have been focused. The parallel efficiency is nearly perfect when the processor number is the power of 7. The parallelized Strassen's method seems always faster than the traditional matrix multiplication methods whose complexity is 2M3 coupled with the BMR method and the Ring method at the system level. The speed gain depends on matrix order M: 20% for M ≈ 1000 and more than 100% for M ≈ 5000

Elsevier - Publisher Connector

Deakin Research Online

A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1995
Field of study

Crossref

Choosing a Better Algorithm for Matrix Multiplication

Author: Zhang Xing
Publication venue: 'Oklahoma State University Library'
Publication date: 01/07/2000
Field of study

Matrix multiplication is a basic operation of linear algebra, and has numerous applications to the theory and practice of computation. Many applications can be solved fast if the algorithm of matrix multiplication is fast because it is a substantial part of these applications. This thesis conducts the study of three algorithms; the straightforward algorithm, Winograd's algorithm, Strassen's algorithm, their time complexities, and compares the three algorithms using graphs. The thesis also briefly describes two asymptotic improvements: Pan's of 1983 and Strassen's of 1986

SHAREOK repository

Computational Complexity and Numerical Stability of Linear Problems

Author: Holtz Olga
Shomron Noam
Publication venue
Publication date: 01/01/2000
Field of study

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

ATCOM: Automatically tuned collective communication system for SMP clusters.

Author: Wu Meng-Shiou
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2005
Field of study

Conventional implementations of collective communications are based on point-to-point communications, and their optimizations have been focused on efficiency of those communication algorithms. However, point-to-point communications are not the optimal choice for modern computing clusters of SMPs due to their two-level communication structure. In recent years, a few research efforts have investigated efficient collective communications for SMP clusters. This dissertation is focused on platform-independent algorithms and implementations in this area;There are two main approaches to implementing efficient collective communications for clusters of SMPs: using shared memory operations for intra-node communications, and over-lapping inter-node/intra-node communications. The former fully utilizes the hardware based shared memory of an SMP, and the latter takes advantage of the inherent hierarchy of the communications within a cluster of SMPs. Previous studies focused on clusters of SMP from certain vendors. However, the previously proposed methods are not portable to other systems. Because the performance optimization issue is very complicated and the developing process is very time consuming, it is highly desired to have self-tuning, platform-independent implementations. As proven in this dissertation, such an implementation can significantly outperform the other point-to-point based portable implementations and some platform-specific implementations;The dissertation describes in detail the architecture of the platform-independent implementation. There are four system components: shared memory-based collective communications, overlapping mechanisms for inter-node and intra-node communications, a prediction-based tuning module and a micro-benchmark based tuning module. Each component is carefully designed with the goal of automatic tuning in mind

Digital Repository @ Iowa State University (ISU)

UNT Digital Library

Literature Study on Analyzing and Designing of Algorithms

Author: Aishwarya
Sneha Kumari
Publication venue: Global Journals Inc. (US)
Publication date: 28/10/2023
Field of study

The fundamental goal of problem solution under numerous limitations such as those imposed by issue size performance and cost in terms of both space and time Designing a quick effective and efficient solution to a problem domain is the objective Certain problems are simple to resolve while others are challenging To develop a quick and effective answer much intelligence is needed A new technology is required for system design and the foundation of the new technology is the improvement of an already existing algorithm The goal of algorithm research is to create effective algorithms that improve scalability dependability and availability in addi

Global Journal of Computer Science and Technology (GJCST)