Search CORE

3,106 research outputs found

Recommended from our members

Fine-grain loop scheduling for MIMD machines

Author: Brownhill Carrie J.
Kim Ki-chang
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 02/10/1990
Field of study

Previous algorithms for parallelizing loops on MIMD machines have been based on assigning one or more loop iterations to each processor, introducing synchronization as required. These methods exploit only iteration level parallelism, and ignore the parallelism that may exist at a lower level.In order to exploit parallelism both within and across iterations, our algorithm analyzes and schedules the loop at the statement level. The loop schedule reflects the expected communication and synchronization costs of the target machine. We provide test results that show that this algorithm can produce good speedup of loops on an MIMD machine

eScholarship - University of California

Adapting the interior point method for the solution of LPs on serial, coarse grain parallel and massively parallel computers

Author: Andersen J
Levkovitz R
Mitra G
Tamiz M
Publication venue: Brunel University
Publication date: 01/01/1990
Field of study

In this paper we describe a unified scheme for implementing an interior point algorithm (IPM) over a range of computer architectures. In the inner iteration of the IPM a search direction is computed using Newton's method. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system, and the design of data structures to take advantage of serial, coarse grain parallel and massively parallel computer architectures, are considered in detail. We put forward arguments as to why integration of the system within a sparse simplex solver is important and outline how the system is designed to achieve this integration

CiteSeerX

Brunel University Research Archive

Recommended from our members

Percolation scheduling for non-VLIW machines

Author: Brownhill Carrie J.
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 15/01/1990
Field of study

Percolation Scheduling, a technique for compile-time code parallelization, has proven very successful for exploiting fine-grain irregular parallelism in ordinary programs. Currently, this technology is targeted only to VLIW (Very Long Instruction Word) machines, which have the advantages of 'free' synchronization and communication. Shared memory multi-processors can simulate the execution characteristics of VLIW machines with the use of static barriers. Preliminary results show that Percolation Scheduling can be used with good results on this type of architecture by increasing the granularity from operation level to source statement level, removing any redundant synchronization, and providing an efficient implementation of multi-way jumps

eScholarship - University of California

Parallel matrix inversion techniques

Author: Kumar M. J.
Lau K. K.
Venkatesh S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

In this paper, we present techniques for inverting sparse, symmetric and positive definite matrices on parallel and distributed computers. We propose two algorithms, one for SIMD implementation and the other for MIMD implementation. These algorithms are modified versions of Gaussian elimination and they take into account the sparseness of the matrix. Our algorithms perform better than the general parallel Gaussian elimination algorithm. In order to demonstrate the usefulness of our technique, we implemented the snake problem using our sparse matrix algorithm. Our studies reveal that the proposed sparse matrix inversion algorithm significantly reduces the time taken for obtaining the solution of the snake problem. In this paper, we present the results of our experimental work

Deakin Research Online

An assessment of the connection machine

Author: Schreiber Robert
Publication venue
Publication date
Field of study

The CM-2 is an example of a connection machine. The strengths and problems of this implementation are considered as well as important issues in the architecture and programming environment of connection machines in general. These are contrasted to the same issues in Multiple Instruction/Multiple Data (MIMD) microprocessors and multicomputers

NASA Technical Reports Server

Ambisonic audio system optimization using a HPC cluster

Author: Mair Quentin
Moore David
Wakefield Jonathan
Publication venue
Publication date: 01/01/2011
Field of study

ResearchOnline@GCU

A communication-ordered task graph allocation algorithm

Author: Evans John D.
Kessler Robert R.
Publication venue: University of Utah
Publication date: 01/01/1992
Field of study

technical reportThe inherently asynchronous nature of the data flow computation model allows the exploitation of maximum parallelism in program execution. While this computational model holds great promise, several problems must be solved in order to achieve a high degree of program performance. The allocation and scheduling of programs on MIMD distributed memory parallel hardware, is necessary for the implementation of efficient parallel systems. Finding optimal solutions requires that maximum parallelism be achieved consistent with resource limits and minimizing communication costs, and has been proven to be in the class of NP-complete problems. This paper addresses the problem of static allocation of tasks to distributed memory MIMD systems where simultaneous computation and communication is a factor. This paper discusses similarities and differences between several recent heuristic allocation approaches and identifies common problems inherent in these approaches. This paper presents a new algorithm scheme and heuristics that resolves the identified problems and shows significant performance benefits

The University of Utah: J. Willard Marriott Digital Library

Modula-2*: An extension of Modula-2 for highly parallel programs

Author: Herter Christian G.
Tichy Walter F.
Publication venue
Publication date
Field of study

Parallel programs should be machine-independent, i.e., independent of properties that are likely to differ from one parallel computer to the next. Extensions are described of Modula-2 for writing highly parallel, portable programs meeting these requirements. The extensions are: synchronous and asynchronous forms of forall statement; and control of the allocation of data to processors. Sample programs written with the extensions demonstrate the clarity of parallel programs when machine-dependent details are omitted. The principles of efficiently implementing the extensions on SIMD, MIMD, and MSIMD machines are discussed. The extensions are small enough to be integrated easily into other imperative languages

NASA Technical Reports Server

Highly parallel computation

Author: Denning Peter J.
Tichy Walter F.
Publication venue
Publication date
Field of study

Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

NASA Technical Reports Server