112 research outputs found
The NAS Parallel Benchmarks 2.1 Results
We present performance results for version 2.1 of the NAS Parallel Benchmarks (NPB) on the following architectures: IBM SP2/66 MHz; SGI Power Challenge Array/90 MHz; Cray Research T3D; and Intel Paragon. The NAS Parallel Benchmarks are a widely-recognized suite of benchmarks originally designed to compare the performance of highly parallel computers with that of traditional supercomputers
Computers for Lattice Field Theories
Parallel computers dedicated to lattice field theories are reviewed with
emphasis on the three recent projects, the Teraflops project in the US, the
CP-PACS project in Japan and the 0.5-Teraflops project in the US. Some new
commercial parallel computers are also discussed. Recent development of
semiconductor technologies is briefly surveyed in relation to possible
approaches toward Teraflops computers.Comment: 15 pages with 16 PS figures, review presented at Lattice 93, LaTeX
(espcrc2.sty required
LAPSES: A Recipe for High-Performance Adaptive Router Design
Earlier research has shown that adaptive routing can help in improving network performance. However, it has not received adequate attention in commercial routers mainly due to the additional hardware complexity, and the perceived cost and performance degradation that may result from this complexity. These concerns can be mitigated if one can design a cost-effective router that can support adaptive routing. This paper proposes a three step recipe — Look-Ahead routing, intelligent Path Selection, and an Economic Storage implementation, called the LAPSES approach — for cost-effective high performance pipelined adaptive router design. The first step, look-ahead routing, reduces a pipeline stage in the router by making table lookup and arbitration concurrent. Next, three new traffic-sensitive path selection heuristics (LRU, LFU and MAX-CREDIT) are proposed to select one of the available alternate paths. Finally, two techniques for reducing routing table size of the adaptive router are presented. These are called meta-table routing and economical storage. The proposed economical storage needs a routing table with only 9 and 27 entries for two and three dimensional meshes, respectively. All these design ideas are evaluated on a (16 16) mesh network via simulation. A fully adaptive algorithm and various traffic patterns are used to examine the performance benefits. Performance results show that the look-ahead design as well as the path selection heuristics boost network performance, while the economical storage approach turns out to be an ideal choice in comparison to full-table and meta-table options. We believe the router resulting from these three design enhancements can make adaptive routing a viable choice for interconnects.
Practical Parallel Algorithms for Personalized Communication and Integer Sorting
A fundamental challenge for parallel computing is to obtain
high-level, architecture independent, algorithms which efficiently
execute on general-purpose parallel machines. With the emergence of
message passing standards such as MPI, it has become easier to design
efficient and portable parallel algorithms by making use of these
communication primitives. While existing primitives allow an
assortment of collective communication routines, they do not handle an
important communication event when most or all processors have
non-uniformly sized personalized messages to exchange with each other.
We focus in this paper on the h-relation personalized communication
whose efficient implementation will allow high performance
implementations of a large class of algorithms. While most previous
h-relation algorithms use randomization, this paper presents a new
deterministic approach for h-relation personalized communication. As
an application, we present an efficient algorithm for stable integer
sorting.
The algorithms presented in this paper have been coded in
Split-C and run on a variety of platforms, including the Thinking
Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific
CS-2, and the Intel Paragon. Our experimental results are consistent
with the theoretical analysis and illustrate the scalability and
efficiency of our algorithms across different platforms. In fact, they
seem to outperform all similar algorithms known to the authors on
these platforms.
(Also cross-referenced as UMIACS-TR-95-101.
A Hybrid Decomposition Parallel Implementation of the Car-Parrinello Method
We have developed a flexible hybrid decomposition parallel implementation of
the first-principles molecular dynamics algorithm of Car and Parrinello. The
code allows the problem to be decomposed either spatially, over the electronic
orbitals, or any combination of the two. Performance statistics for 32, 64, 128
and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel
supercomputers and comparison with the performance of an optimized code running
the smaller systems on the Cray Y-MP and C90 are presented.Comment: Accepted by Computer Physics Communications, latex, 34 pages without
figures, 15 figures available in PostScript form via WWW at
http://www-theory.chem.washington.edu/~wiggs/hyb_figures.htm
Peripheral twists for torus topologies with arbitrary aspect ratio
A torus is a common topology used in supercomputer networks. Asymmetric Tori suffer from resource usage imbalance, which translates to reduced performance. Twisted Tori employ a twist in the peripheral links of one or more dimensions to improve the topological parameters and overall performance of asymmetric networks. 2D and 3D twisted tori with aspect ratios 2:1 and 2:1:1 have been studied in detail. However, commercial machines do not necessarily employ those aspects ratios. In this work we present an early study of the effect of peripheral link twisting in multidimensional twisted tori with arbitrary aspect ratios. We observe that, in the general case, it is impossible to find a specific twist that minimizes all the interesting topological parameters of the network. We also introduce a requirement for the use of several twists in multidimensional torus with adaptive routing.Postprint (author’s final draft
- …