Search CORE

112 research outputs found

The NAS Parallel Benchmarks 2.1 Results

Author: Saphir William
Woo Alex
Yarrow Maurice
Publication venue
Publication date
Field of study

We present performance results for version 2.1 of the NAS Parallel Benchmarks (NPB) on the following architectures: IBM SP2/66 MHz; SGI Power Challenge Array/90 MHz; Cray Research T3D; and Intel Paragon. The NAS Parallel Benchmarks are a widely-recognized suite of benchmarks originally designed to compare the performance of highly parallel computers with that of traditional supercomputers

NASA Technical Reports Server

Computers for Lattice Field Theories

Author: Iwasaki Y.
Publication venue: 'Elsevier BV'
Publication date: 01/01/1994
Field of study

Parallel computers dedicated to lattice field theories are reviewed with emphasis on the three recent projects, the Teraflops project in the US, the CP-PACS project in Japan and the 0.5-Teraflops project in the US. Some new commercial parallel computers are also discussed. Recent development of semiconductor technologies is briefly surveyed in relation to possible approaches toward Teraflops computers.Comment: 15 pages with 16 PS figures, review presented at Lattice 93, LaTeX (espcrc2.sty required

arXiv.org e-Print Archive

CERN Document Server

LAPSES: A Recipe for High-Performance Adaptive Router Design

Author: Anand Sivasubramaniam
Aniruddha S. Vaidya
Chita R. Das
Publication venue
Publication date: 01/01/1999
Field of study

Earlier research has shown that adaptive routing can help in improving network performance. However, it has not received adequate attention in commercial routers mainly due to the additional hardware complexity, and the perceived cost and performance degradation that may result from this complexity. These concerns can be mitigated if one can design a cost-effective router that can support adaptive routing. This paper proposes a three step recipe — Look-Ahead routing, intelligent Path Selection, and an Economic Storage implementation, called the LAPSES approach — for cost-effective high performance pipelined adaptive router design. The first step, look-ahead routing, reduces a pipeline stage in the router by making table lookup and arbitration concurrent. Next, three new traffic-sensitive path selection heuristics (LRU, LFU and MAX-CREDIT) are proposed to select one of the available alternate paths. Finally, two techniques for reducing routing table size of the adaptive router are presented. These are called meta-table routing and economical storage. The proposed economical storage needs a routing table with only 9 and 27 entries for two and three dimensional meshes, respectively. All these design ideas are evaluated on a (16 16) mesh network via simulation. A fully adaptive algorithm and various traffic patterns are used to examine the performance benefits. Performance results show that the look-ahead design as well as the path selection heuristics boost network performance, while the economical storage approach turns out to be an ideal choice in comparison to full-table and meta-table options. We believe the router resulting from these three design enhancements can make adaptive routing a viable choice for interconnects.

CiteSeerX

Crossref

Message‐passing performance of various computers

Author: Jack J. Dongarra
Tom Dunigan
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Crossref

Message-passing performance of various computers

Author: Jack J. Dongarra
Tom Dunigan
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

Crossref

Practical Parallel Algorithms for Personalized Communication and Integer Sorting

Author: Bader David A.
Helman David R.
JaJa Joseph
Publication venue
Publication date: 15/10/1998
Field of study

A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel algorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have non-uniformly sized personalized messages to exchange with each other. We focus in this paper on the h-relation personalized communication whose efficient implementation will allow high performance implementations of a large class of algorithms. While most previous h-relation algorithms use randomization, this paper presents a new deterministic approach for h-relation personalized communication. As an application, we present an efficient algorithm for stable integer sorting. The algorithms presented in this paper have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, and the Intel Paragon. Our experimental results are consistent with the theoretical analysis and illustrate the scalability and efficiency of our algorithms across different platforms. In fact, they seem to outperform all similar algorithms known to the authors on these platforms. (Also cross-referenced as UMIACS-TR-95-101.

Digital Repository at the University of Maryland

A Hybrid Decomposition Parallel Implementation of the Car-Parrinello Method

Author: Andersen
Andreoni
Angelopoulos
Bachelet
Ballone
Brocks
Brommer
Brommer
Car
Car
Car
Clarke
Gupta
Hannes Jónsson
Hohenberg
Hohl
Hohl
Hoover
James Wiggs
King-Smith
Kleinman
Kohn
Littlefield
Marinescu
Nelson
Nosé
Payne
Ryckaert
Troullier
Wiggs
Williams
Štich
Štich
Štich
Publication venue: 'Elsevier BV'
Publication date: 14/11/1994
Field of study

We have developed a flexible hybrid decomposition parallel implementation of the first-principles molecular dynamics algorithm of Car and Parrinello. The code allows the problem to be decomposed either spatially, over the electronic orbitals, or any combination of the two. Performance statistics for 32, 64, 128 and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel supercomputers and comparison with the performance of an optimized code running the smaller systems on the Cray Y-MP and C90 are presented.Comment: Accepted by Computer Physics Communications, latex, 34 pages without figures, 15 figures available in PostScript form via WWW at http://www-theory.chem.washington.edu/~wiggs/hyb_figures.htm

arXiv.org e-Print Archive

Crossref

Peripheral twists for torus topologies with arbitrary aspect ratio

Author: Beivide Palacio Julio Ramón
Martínez Carmen
Moreto Planas Miquel
Vallejo Gutiérrez Enrique
Publication venue
Publication date: 01/01/2011
Field of study

A torus is a common topology used in supercomputer networks. Asymmetric Tori suffer from resource usage imbalance, which translates to reduced performance. Twisted Tori employ a twist in the peripheral links of one or more dimensions to improve the topological parameters and overall performance of asymmetric networks. 2D and 3D twisted tori with aspect ratios 2:1 and 2:1:1 have been studied in detail. However, commercial machines do not necessarily employ those aspects ratios. In this work we present an early study of the effect of peripheral link twisting in multidimensional twisted tori with arbitrary aspect ratios. We observe that, in the general case, it is impossible to find a specific twist that minimizes all the interesting topological parameters of the network. We also introduce a requirement for the use of several twists in multidimensional torus with adaptive routing.Postprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC