Search CORE

20 research outputs found

A Linked-Cell Domain Decomposition Method for Molecular Dynamics Simulation on a Scalable Multiprocessor

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1992
Field of study

Developing a Benchmark for Evaluating the Performance of Parallel Computers

Author: Williamson D. Ladd
Publication venue: DigitalCommons@USU
Publication date: 20/04/1994
Field of study

This paper discusses the development of a portable suite of benchmarking programs for parallel computers. Comparative measurement of the performance of parallel computing systems has been limited because of the great diversity of architectures and of processor interconnection schemes. One solution is to translate benchmark codes into a consistent and portable parallel language. This paper reports on progress in developing such a portable suite of benchmarks. An extensive introduction to parallel computing is included as an appendix, to provide a thorough understanding of the factors complicating development of the performance suite. Key to the development was the use of p4, a library of tools developed at Argonne National Laboratory. The benchmark codes were translated successfully using p4 and were run on a variety of parallel machines. Conclusions and suggestions for future work are given

DigitalCommons@USU

Compile-time optimization of near-neighbor communication for scalable shared-memory multiprocessors

Author: Abraham Santosh G.
Hudak David E.
Publication venue: 'Elsevier BV'
Publication date: 01/08/1992
Field of study

Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines, where the exploitation of the memory hierarchy is critical to achieving high performance. Iterative data parallel loops with near-neighbor communication account for many important numerical applications. In such loops, the communication of partial results stresses the memory system performance. In this paper, we develop data placement schemes that minimize communication time where the near-neighbor interaction is determined by a stencil. Under a given loop partition, our compile-time algorithm partitions global data into four classes for each processor, with each class requiring specific consistency maintenance requirements. The ADAPT (Automatic Data Allocation and Partitioning Tool) system was implemented to automatically partition parallel code segments for the BBN TC2000, a scalable shared-memory multiprocessor. ADAPT caches global arrays and maintains data consistency in software through instructions that flush data from private caches. Restructuring of a fluid flow code segment by ADAPT improved performance by a factor of more than 3 on the BBN TC2000. Features in current generation pipelined processors with multiple functional units permit the overlap of memory accesses with computation. Our experiments on the BBN TC2000 show that the degree of overlap is limited by architectural parameters, such as the number of CPU registers.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/30342/1/0000744.pd

Deep Blue Documents at the University of Michigan

The Parallel C Preprocessor

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1992
Field of study

Crossref

Towards the Teraflop CFD

Author: Schreiber Robert
Simon Horst D.
Publication venue
Publication date
Field of study

We are surveying current projects in the area of parallel supercomputers. The machines considered here will become commercially available in the 1990 - 1992 time frame. All are suitable for exploring the critical issues in applying parallel processors to large scale scientific computations, in particular CFD calculations. This chapter presents an overview of the surveyed machines, and a detailed analysis of the various architectural and technology approaches taken. Particular emphasis is placed on the feasibility of a Teraflops capability following the paths proposed by various developers

NASA Technical Reports Server

A Feature Taxonomy and Survey of Synchronization Primitive Implementations

Author: Glew Andy
Hwu Wen-mei
Publication venue: Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/02/1991
Field of study