228 research outputs found

    Memory Access Optimizations for High-Performance Computing

    Get PDF
    This paper discusses the importance of memory access optimizations which are shown to be highly effective on the MasPar architecture. The study is based on two MasPar machines, a 16K-processor MP-1 and a 4K-processor MP-2. A software pipelining technique overlaps memory accesses with computation and/or communication. Another optimization, called the register window technique reduces the number of loads in a loop. These techniques are evaluated using three parallel matrix multiplication algorithms on both the MasPar machines. The matrix multiplication study shows that for a highly computation intensive problem, reducing the interprocessor communication can become a secondary issue compared to memory access optimization. Also, it is shown that memory access optimizations can play a more important role than the choice of a superior parallel algorithm. Keywords: load/store architecture, memory accesses, matrix multiplication, parallel programming

    Compiling machine-independent parallel programs

    Get PDF

    FEM Mesh Mapping to a SIMD Machine Using Genetic Algorithms

    Get PDF
    The Finite Element Method is a computationally expensive method used to perform engineering analyses. By performing such computations on a parallel machine using a SIMD paradigm, these analyses\u27 run time can be drastically reduced. However, the mapping of the FEM mesh elements to the SIMD machine processing elements is an NP-complete problem. This thesis examines the use of Genetic Algorithms as a search technique to find quality solutions to the mapping problem. A hill climbing algorithm is compared to a traditional genetic algorithm, as well as a messy genetic algorithm. The results and comparative advantages of these approaches are discussed

    Efficient Implementation of Mesh Generation and FDTD Simulation of Electromagnetic Fields

    Get PDF
    This thesis presents an implementation of the Finite Difference Time Domain (FDTD) method on a massively parallel computer system, for the analysis of electromagnetic phenomenon. In addition, the implementation of an efficient mesh generator is also presented. For this research we selected the MasPar system, as it is a relatively low cost, reliable, high performance computer system. In this thesis we are primarily concerned with the selection of an efficient algorithm for each of the programs written for our selected application, and devising clever ways to make the best use of the MasPar system. This thesis has a large emphasis on examining the application performance

    A Massively Parallel MIMD Implemented by SIMD Hardware?

    Get PDF
    Both conventional wisdom and engineering practice hold that a massively parallel MIMD machine should be constructed using a large number of independent processors and an asynchronous interconnection network. In this paper, we suggest that it may be beneficial to implement a massively parallel MIMD using microcode on a massively parallel SIMD microengine; the synchronous nature of the system allows much higher performance to be obtained with simpler hardware. The primary disadvantage is simply that the SIMD microengine must serialize execution of different types of instructions - but again the static nature of the machine allows various optimizations that can minimize this detrimental effect. In addition to presenting the theory behind construction of efficient MIMD machines using SIMD microengines, this paper discusses how the techniques were applied to create a 16,384- processor shared memory barrier MIMD using a SIMD MasPar MP-1. Both the MIMD structure and benchmark results are presented. Even though the MasPar hardware is not ideal for implementing a MIMD and our microinterpreter was written in a high-level language (MPL), peak MIMD performance was 280 MFLOPS as compared to 1.2 GFLOPS for the native SIMD instruction set. Of course, comparing peak speeds is of dubious value; hence, we have also included a number of more realistic benchmark results

    A parallel programming model for irregular dynamic neural networks

    Get PDF

    Parallel Computer Needs at Dartmouth College

    Get PDF
    To determine the need for a parallel computer on campus, a committee of the Graduate Program in Computer Science surveyed selected Dartmouth College faculty and students in December, 1991, and January, 1992. We hope that the information in this report can be used by many groups on campus, including the Computer Science graduate program and DAGS summer institute, Kiewit\u27s NH Supercomputer Initiative, and by numerous researchers hoping to collaborate with people in other disciplines. We found significant interest in parallel supercomputing on campus. An on-campus parallel supercomputing facility would not only support numerous courses and research projects, but would provide a locus for intellectual activity in parallel computing, encouraging interdisciplinary collaboration. We believe that this report is a first step in that direction

    Parallel computing for image processing problems.

    Get PDF
    by Kin-wai Mak.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 52-54).Chapter 1 --- Introduction to Parallel Computing --- p.7Chapter 1.1 --- Parallel Computer Models --- p.8Chapter 1.2 --- Forms of Parallelism --- p.12Chapter 1.3 --- Performance Evaluation --- p.15Chapter 1.3.1 --- Finding Machine Parameters --- p.15Chapter 1.3.2 --- Amdahl's Law --- p.19Chapter 1.3.3 --- Gustafson's Law --- p.20Chapter 1.3.4 --- Scalability Analysis --- p.20Chapter 2 --- Introduction to Image Processing --- p.26Chapter 2.1 --- Image Restoration Problem --- p.26Chapter 2.1.1 --- Toeplitz Least Squares Problems --- p.29Chapter 2.1.2 --- The Need For Regularization --- p.31Chapter 2.1.3 --- Guide Star Image --- p.32Chapter 3 --- Toeplitz Solvers --- p.34Chapter 3.1 --- Introduction --- p.34Chapter 3.2 --- Parallel Implementation --- p.38Chapter 3.2.1 --- Overview of MasPar --- p.38Chapter 3.2.2 --- Design Methodology --- p.39Chapter 3.2.3 --- Implementation Details --- p.42Chapter 3.2.4 --- Application to Ground Based Astronomy --- p.44Chapter 3.2.5 --- Performance Analysis --- p.46Chapter 3.2.6 --- The Graphical Interface --- p.48Bibliograph
    • …
    corecore