Search CORE

243 research outputs found

An assessment of the connection machine

Author: Schreiber Robert
Publication venue
Publication date
Field of study

The CM-2 is an example of a connection machine. The strengths and problems of this implementation are considered as well as important issues in the architecture and programming environment of connection machines in general. These are contrasted to the same issues in Multiple Instruction/Multiple Data (MIMD) microprocessors and multicomputers

NASA Technical Reports Server

A compiler approach to scalable concurrent program design

Author: Foster Ian
Taylor Stephen
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1992
Field of study

The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support. The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics

CiteSeerX

Caltech Authors

Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers

Author: Palermo Daniel Joseph
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/05/1996
Field of study

Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Programming a Distributed System Using Shared Objects

Author: Bal H.E.
Kaashoek M.F.
Tanenbaum A.S.
Publication venue
Publication date: 01/01/1993
Field of study

Building the hardware for a high-performance distributed computer system is a lot easier than building its software. The authors describe a model for programming distributed systems based on abstract data types that can be replicated on all machines that need them. Read operations are done locally, without requiring network traffic. Writes can be done using a reliable broadcast algorithm if the hardware supports broadcasting; otherwise, a point-to-point protocol is used. The authors have built such a system based on the Amoeba microkernel, and implemented a language, Orca, on top of it. For Orca applications that have a high ratio of reads to writes, they measure good speedups on a system with 16 processors

VU Research Portal

Towards the Teraflop CFD

Author: Schreiber Robert
Simon Horst D.
Publication venue
Publication date
Field of study

We are surveying current projects in the area of parallel supercomputers. The machines considered here will become commercially available in the 1990 - 1992 time frame. All are suitable for exploring the critical issues in applying parallel processors to large scale scientific computations, in particular CFD calculations. This chapter presents an overview of the surveyed machines, and a detailed analysis of the various architectural and technology approaches taken. Particular emphasis is placed on the feasibility of a Teraflops capability following the paths proposed by various developers

NASA Technical Reports Server

A Global Communication Optimization Technique Based on Data-Flow Analysis and Linear Algebra

Author: Banerjee P.
Choudhary Alok
Kandemir Mahmut
Ramanujam J.
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1998
Field of study

Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this paper, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analysis and a linear algebra framework, and works on structured programs with conditional statements and nested loops but without arbitrary goto statements. The distinctive features of the solution are the accuracy in keeping communication set information, support for general alignments and distributions including block-cyclic distributions and the ability to simulate some of the previous approaches with suitable modifications. We also show how optimizations such as message vectorization, message coalescing and redundancy elimination are supported by our framework. Experimental results on several benchmarks show that our technique is effective in reducing the number of messages (an average of 32% reduction), the volume of the data communicated (an average of 37% reduction), and the execution time (an average of 26% reduction)

Syracuse University Research Facility and Collaborative Environment

Highly parallel computation

Author: Denning Peter J.
Tichy Walter F.
Publication venue
Publication date
Field of study

Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

NASA Technical Reports Server

Performance measurement and analysis of PC based cluster server using SET of Architecture and modeling a scalable High performance cluster

Author: Mehta Mihir J.
Publication venue
Publication date: 01/04/2006
Field of study

Not availabl

Etheses - A Saurashtra University Library Service

Recommended from our members

Data-parallel programming on MIMD computers

Author: Anderson Ray J.
Hatcher Philip J.
Jones Robert R.
Lapadula Anthony J.
Quinn Michael J.
Seevers Bradley K.
Publication venue: Oregon State University. Department of Computer Science
Publication date
Field of study

We are convinced that the combination of data-parallel languages and MIMD hardware can make an important contribution to high speed computing. The data-parallel paradigm is a natural way to solve a large number of problems arising in science and engineering. Data-parallel programs are easier to design, implement, and debug than programs written in a MIMD language. For their part, the existence of multiple control units on MIMD computers can allow loosely-synchronous programs to execute more efficiently than they would on a SIMD architecture. In this paper we provide empirical evidence to support these assertions. We describe the implementation of two compilers for the data-parallel programming language C*. One compiler generates code for the NCUBE 3200 hypercube multicomputer, the other generates code for the Sequent Balance 21000 multiprocessor. We have compiled and executed a suite of C* programs on these two systems, and we present the execution times and speedups achieved by these programs

ScholarsArchive@OSU