3,106 research outputs found
Recommended from our members
Fine-grain loop scheduling for MIMD machines
Previous algorithms for parallelizing loops on MIMD machines have been based on assigning one or more loop iterations to each processor, introducing synchronization as required. These methods exploit only iteration level parallelism, and ignore the parallelism that may exist at a lower level.In order to exploit parallelism both within and across iterations, our algorithm analyzes and schedules the loop at the statement level. The loop schedule reflects the expected communication and synchronization costs of the target machine. We provide test results that show that this algorithm can produce good speedup of loops on an MIMD machine
Adapting the interior point method for the solution of LPs on serial, coarse grain parallel and massively parallel computers
In this paper we describe a unified scheme for implementing an interior point algorithm (IPM) over a range of computer architectures. In the inner iteration of the IPM a search direction is computed using Newton's method. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system, and the design of data structures to take advantage of serial, coarse grain parallel and massively parallel computer architectures, are considered in detail. We put forward arguments as to why integration of the system within a sparse simplex solver is important and outline how the system is designed to achieve this integration
Recommended from our members
Percolation scheduling for non-VLIW machines
Percolation Scheduling, a technique for compile-time code parallelization, has proven very successful for exploiting fine-grain irregular parallelism in ordinary programs. Currently, this technology is targeted only to VLIW (Very Long Instruction Word) machines, which have the advantages of 'free' synchronization and communication. Shared memory multi-processors can simulate the execution characteristics of VLIW machines with the use of static barriers. Preliminary results show that Percolation Scheduling can be used with good results on this type of architecture by increasing the granularity from operation level to source statement level, removing any redundant synchronization, and providing an efficient implementation of multi-way jumps
Parallel matrix inversion techniques
In this paper, we present techniques for inverting sparse, symmetric and positive definite matrices on parallel and distributed computers. We propose two algorithms, one for SIMD implementation and the other for MIMD implementation. These algorithms are modified versions of Gaussian elimination and they take into account the sparseness of the matrix. Our algorithms perform better than the general parallel Gaussian elimination algorithm. In order to demonstrate the usefulness of our technique, we implemented the snake problem using our sparse matrix algorithm. Our studies reveal that the proposed sparse matrix inversion algorithm significantly reduces the time taken for obtaining the solution of the snake problem. In this paper, we present the results of our experimental work
An assessment of the connection machine
The CM-2 is an example of a connection machine. The strengths and problems of this implementation are considered as well as important issues in the architecture and programming environment of connection machines in general. These are contrasted to the same issues in Multiple Instruction/Multiple Data (MIMD) microprocessors and multicomputers
A communication-ordered task graph allocation algorithm
technical reportThe inherently asynchronous nature of the data flow computation model allows the exploitation of maximum parallelism in program execution. While this computational model holds great promise, several problems must be solved in order to achieve a high degree of program performance. The allocation and scheduling of programs on MIMD distributed memory parallel hardware, is necessary for the implementation of efficient parallel systems. Finding optimal solutions requires that maximum parallelism be achieved consistent with resource limits and minimizing communication costs, and has been proven to be in the class of NP-complete problems. This paper addresses the problem of static allocation of tasks to distributed memory MIMD systems where simultaneous computation and communication is a factor. This paper discusses similarities and differences between several recent heuristic allocation approaches and identifies common problems inherent in these approaches. This paper presents a new algorithm scheme and heuristics that resolves the identified problems and shows significant performance benefits
Modula-2*: An extension of Modula-2 for highly parallel programs
Parallel programs should be machine-independent, i.e., independent of properties that are likely to differ from one parallel computer to the next. Extensions are described of Modula-2 for writing highly parallel, portable programs meeting these requirements. The extensions are: synchronous and asynchronous forms of forall statement; and control of the allocation of data to processors. Sample programs written with the extensions demonstrate the clarity of parallel programs when machine-dependent details are omitted. The principles of efficiently implementing the extensions on SIMD, MIMD, and MSIMD machines are discussed. The extensions are small enough to be integrated easily into other imperative languages
Highly parallel computation
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed
- …