Search CORE

228 research outputs found

Recommended from our members

Solving large scale linear programming

Author: Hafsteinsson H
Levkovitz R
Mitra G
Publication venue: Brunel University
Publication date: 01/01/1993
Field of study

The interior point method (IPM) is now well established as a competitive technique for solving very large scale linear programming problems. The leading variant of the interior point method is the primal dual - predictor corrector algorithm due to Mehrotra. The main computational steps of this algorithm are the repeated calculation and solution of a large sparse positive definite system of equations. We describe an implementation of the predictor corrector IPM algorithm on MasPar, a massively parallel SIMD computer. At the heart of the implemen-tation is a parallel Cholesky factorization algorithm for sparse matrices. Our implementation uses a new scheme of mapping the matrix onto the processor grid of the MasPar, that results in a more efficient Cholesky factorization than previously suggested schemes. The IPM implementation uses the parallel unit of MasPar to speed up the factorization and other computationally intensive parts of the IPM. An impor-tant part of this implementation is the judicious division of data and computation between the front-end computer, that runs the main IPM algorithm, and the par-allel unit. Performanc

Brunel University Research Archive

Memory Access Optimizations for High-Performance Computing

Author: Clary Jeffrey S.
Kothari S. C.
Publication venue: Iowa State University Digital Repository
Publication date: 13/01/1993
Field of study

This paper discusses the importance of memory access optimizations which are shown to be highly effective on the MasPar architecture. The study is based on two MasPar machines, a 16K-processor MP-1 and a 4K-processor MP-2. A software pipelining technique overlaps memory accesses with computation and/or communication. Another optimization, called the register window technique reduces the number of loads in a loop. These techniques are evaluated using three parallel matrix multiplication algorithms on both the MasPar machines. The matrix multiplication study shows that for a highly computation intensive problem, reducing the interprocessor communication can become a secondary issue compared to memory access optimization. Also, it is shown that memory access optimizations can play a more important role than the choice of a superior parallel algorithm. Keywords: load/store architecture, memory accesses, matrix multiplication, parallel programming

Digital Repository @ Iowa State University (ISU)

Compiling machine-independent parallel programs

Author: Heinz Ernst A.
Lukowicz Paul
Philippsen Michael
Publication venue: Association for Computing Machinery
Publication date: 02/08/2007
Field of study

KITopen

FEM Mesh Mapping to a SIMD Machine Using Genetic Algorithms

Author: Dunkelberg Jr., John S.
Publication venue: Digital WPI
Publication date: 04/01/2001
Field of study

The Finite Element Method is a computationally expensive method used to perform engineering analyses. By performing such computations on a parallel machine using a SIMD paradigm, these analyses\u27 run time can be drastically reduced. However, the mapping of the FEM mesh elements to the SIMD machine processing elements is an NP-complete problem. This thesis examines the use of Genetic Algorithms as a search technique to find quality solutions to the mapping problem. A hill climbing algorithm is compared to a traditional genetic algorithm, as well as a messy genetic algorithm. The results and comparative advantages of these approaches are discussed

DigitalCommons@WPI

Efficient Implementation of Mesh Generation and FDTD Simulation of Electromagnetic Fields

Author: Hill Jonathan
Publication venue: Digital WPI
Publication date: 06/10/1999
Field of study

This thesis presents an implementation of the Finite Difference Time Domain (FDTD) method on a massively parallel computer system, for the analysis of electromagnetic phenomenon. In addition, the implementation of an efficient mesh generator is also presented. For this research we selected the MasPar system, as it is a relatively low cost, reliable, high performance computer system. In this thesis we are primarily concerned with the selection of an efficient algorithm for each of the programs written for our selected application, and devising clever ways to make the best use of the MasPar system. This thesis has a large emphasis on examining the application performance

DigitalCommons@WPI

A Massively Parallel MIMD Implemented by SIMD Hardware?

Author: Cohen W E.
Dietz H. G.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/1992
Field of study

Both conventional wisdom and engineering practice hold that a massively parallel MIMD machine should be constructed using a large number of independent processors and an asynchronous interconnection network. In this paper, we suggest that it may be beneficial to implement a massively parallel MIMD using microcode on a massively parallel SIMD microengine; the synchronous nature of the system allows much higher performance to be obtained with simpler hardware. The primary disadvantage is simply that the SIMD microengine must serialize execution of different types of instructions - but again the static nature of the machine allows various optimizations that can minimize this detrimental effect. In addition to presenting the theory behind construction of efficient MIMD machines using SIMD microengines, this paper discusses how the techniques were applied to create a 16,384- processor shared memory barrier MIMD using a SIMD MasPar MP-1. Both the MIMD structure and benchmark results are presented. Even though the MasPar hardware is not ideal for implementing a MIMD and our microinterpreter was written in a high-level language (MPL), peak MIMD performance was 280 MFLOPS as compared to 1.2 GFLOPS for the native SIMD instruction set. Of course, comparing peak speeds is of dubious value; hence, we have also included a number of more realistic benchmark results

Purdue E-Pubs

A parallel programming model for irregular dynamic neural networks

Author: Prechelt Lutz
Publication venue
Publication date: 02/08/2007
Field of study

KITopen

Parallel Computer Needs at Dartmouth College

Author: Bishop Matt
Drysdale Scot
Johnson Don
Kotz David
Makedon Fillia
Metaxas Takis
Publication venue: Dartmouth Digital Commons
Publication date: 29/06/1994
Field of study

To determine the need for a parallel computer on campus, a committee of the Graduate Program in Computer Science surveyed selected Dartmouth College faculty and students in December, 1991, and January, 1992. We hope that the information in this report can be used by many groups on campus, including the Computer Science graduate program and DAGS summer institute, Kiewit\u27s NH Supercomputer Initiative, and by numerous researchers hoping to collaborate with people in other disciplines. We found significant interest in parallel supercomputing on campus. An on-campus parallel supercomputing facility would not only support numerous courses and research projects, but would provide a locus for intellectual activity in parallel computing, encouraging interdisciplinary collaboration. We believe that this report is a first step in that direction

Dartmouth Digital Commons (Dartmouth College)

Parallel computing for image processing problems.

Author
Publication venue
Publication date: 01/01/1997
Field of study

by Kin-wai Mak.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 52-54).Chapter 1 --- Introduction to Parallel Computing --- p.7Chapter 1.1 --- Parallel Computer Models --- p.8Chapter 1.2 --- Forms of Parallelism --- p.12Chapter 1.3 --- Performance Evaluation --- p.15Chapter 1.3.1 --- Finding Machine Parameters --- p.15Chapter 1.3.2 --- Amdahl's Law --- p.19Chapter 1.3.3 --- Gustafson's Law --- p.20Chapter 1.3.4 --- Scalability Analysis --- p.20Chapter 2 --- Introduction to Image Processing --- p.26Chapter 2.1 --- Image Restoration Problem --- p.26Chapter 2.1.1 --- Toeplitz Least Squares Problems --- p.29Chapter 2.1.2 --- The Need For Regularization --- p.31Chapter 2.1.3 --- Guide Star Image --- p.32Chapter 3 --- Toeplitz Solvers --- p.34Chapter 3.1 --- Introduction --- p.34Chapter 3.2 --- Parallel Implementation --- p.38Chapter 3.2.1 --- Overview of MasPar --- p.38Chapter 3.2.2 --- Design Methodology --- p.39Chapter 3.2.3 --- Implementation Details --- p.42Chapter 3.2.4 --- Application to Ground Based Astronomy --- p.44Chapter 3.2.5 --- Performance Analysis --- p.46Chapter 3.2.6 --- The Graphical Interface --- p.48Bibliograph

CUHK Digital Repository