112 research outputs found

    The NAS Parallel Benchmarks 2.1 Results

    Get PDF
    We present performance results for version 2.1 of the NAS Parallel Benchmarks (NPB) on the following architectures: IBM SP2/66 MHz; SGI Power Challenge Array/90 MHz; Cray Research T3D; and Intel Paragon. The NAS Parallel Benchmarks are a widely-recognized suite of benchmarks originally designed to compare the performance of highly parallel computers with that of traditional supercomputers

    Computers for Lattice Field Theories

    Full text link
    Parallel computers dedicated to lattice field theories are reviewed with emphasis on the three recent projects, the Teraflops project in the US, the CP-PACS project in Japan and the 0.5-Teraflops project in the US. Some new commercial parallel computers are also discussed. Recent development of semiconductor technologies is briefly surveyed in relation to possible approaches toward Teraflops computers.Comment: 15 pages with 16 PS figures, review presented at Lattice 93, LaTeX (espcrc2.sty required

    LAPSES: A Recipe for High-Performance Adaptive Router Design

    Get PDF
    Earlier research has shown that adaptive routing can help in improving network performance. However, it has not received adequate attention in commercial routers mainly due to the additional hardware complexity, and the perceived cost and performance degradation that may result from this complexity. These concerns can be mitigated if one can design a cost-effective router that can support adaptive routing. This paper proposes a three step recipe — Look-Ahead routing, intelligent Path Selection, and an Economic Storage implementation, called the LAPSES approach — for cost-effective high performance pipelined adaptive router design. The first step, look-ahead routing, reduces a pipeline stage in the router by making table lookup and arbitration concurrent. Next, three new traffic-sensitive path selection heuristics (LRU, LFU and MAX-CREDIT) are proposed to select one of the available alternate paths. Finally, two techniques for reducing routing table size of the adaptive router are presented. These are called meta-table routing and economical storage. The proposed economical storage needs a routing table with only 9 and 27 entries for two and three dimensional meshes, respectively. All these design ideas are evaluated on a (16 16) mesh network via simulation. A fully adaptive algorithm and various traffic patterns are used to examine the performance benefits. Performance results show that the look-ahead design as well as the path selection heuristics boost network performance, while the economical storage approach turns out to be an ideal choice in comparison to full-table and meta-table options. We believe the router resulting from these three design enhancements can make adaptive routing a viable choice for interconnects.

    Message‐passing performance of various computers

    Get PDF

    Message-passing performance of various computers

    Get PDF

    Practical Parallel Algorithms for Personalized Communication and Integer Sorting

    Get PDF
    A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel algorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have non-uniformly sized personalized messages to exchange with each other. We focus in this paper on the h-relation personalized communication whose efficient implementation will allow high performance implementations of a large class of algorithms. While most previous h-relation algorithms use randomization, this paper presents a new deterministic approach for h-relation personalized communication. As an application, we present an efficient algorithm for stable integer sorting. The algorithms presented in this paper have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, and the Intel Paragon. Our experimental results are consistent with the theoretical analysis and illustrate the scalability and efficiency of our algorithms across different platforms. In fact, they seem to outperform all similar algorithms known to the authors on these platforms. (Also cross-referenced as UMIACS-TR-95-101.

    A Hybrid Decomposition Parallel Implementation of the Car-Parrinello Method

    Full text link
    We have developed a flexible hybrid decomposition parallel implementation of the first-principles molecular dynamics algorithm of Car and Parrinello. The code allows the problem to be decomposed either spatially, over the electronic orbitals, or any combination of the two. Performance statistics for 32, 64, 128 and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel supercomputers and comparison with the performance of an optimized code running the smaller systems on the Cray Y-MP and C90 are presented.Comment: Accepted by Computer Physics Communications, latex, 34 pages without figures, 15 figures available in PostScript form via WWW at http://www-theory.chem.washington.edu/~wiggs/hyb_figures.htm

    Peripheral twists for torus topologies with arbitrary aspect ratio

    Get PDF
    A torus is a common topology used in supercomputer networks. Asymmetric Tori suffer from resource usage imbalance, which translates to reduced performance. Twisted Tori employ a twist in the peripheral links of one or more dimensions to improve the topological parameters and overall performance of asymmetric networks. 2D and 3D twisted tori with aspect ratios 2:1 and 2:1:1 have been studied in detail. However, commercial machines do not necessarily employ those aspects ratios. In this work we present an early study of the effect of peripheral link twisting in multidimensional twisted tori with arbitrary aspect ratios. We observe that, in the general case, it is impossible to find a specific twist that minimizes all the interesting topological parameters of the network. We also introduce a requirement for the use of several twists in multidimensional torus with adaptive routing.Postprint (author’s final draft
    corecore