Search CORE

1,339 research outputs found

A Hybrid Decomposition Parallel Implementation of the Car-Parrinello Method

Author: Andersen
Andreoni
Angelopoulos
Bachelet
Ballone
Brocks
Brommer
Brommer
Car
Car
Car
Clarke
Gupta
Hannes Jónsson
Hohenberg
Hohl
Hohl
Hoover
James Wiggs
King-Smith
Kleinman
Kohn
Littlefield
Marinescu
Nelson
Nosé
Payne
Ryckaert
Troullier
Wiggs
Williams
Štich
Štich
Štich
Publication venue: 'Elsevier BV'
Publication date: 14/11/1994
Field of study

We have developed a flexible hybrid decomposition parallel implementation of the first-principles molecular dynamics algorithm of Car and Parrinello. The code allows the problem to be decomposed either spatially, over the electronic orbitals, or any combination of the two. Performance statistics for 32, 64, 128 and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel supercomputers and comparison with the performance of an optimized code running the smaller systems on the Cray Y-MP and C90 are presented.Comment: Accepted by Computer Physics Communications, latex, 34 pages without figures, 15 figures available in PostScript form via WWW at http://www-theory.chem.washington.edu/~wiggs/hyb_figures.htm

arXiv.org e-Print Archive

Crossref

Alternating-Direction Line-Relaxation Methods on Multicomputers

Author: Hofhaus Jörn
Van de Velde Eric
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/1996
Field of study

We study the multicom.puter performance of a three-dimensional Navier–Stokes solver based on alternating-direction line-relaxation methods. We compare several multicomputer implementations, each of which combines a particular line-relaxation method and a particular distributed block-tridiagonal solver. In our experiments, the problem size was determined by resolution requirements of the application. As a result, the granularity of the computations of our study is finer than is customary in the performance analysis of concurrent block-tridiagonal solvers. Our best results were obtained with a modified half-Gauss–Seidel line-relaxation method implemented by means of a new iterative block-tridiagonal solver that is developed here. Most computations were performed on the Intel Touchstone Delta, but we also used the Intel Paragon XP/S, the Parsytec SC-256, and the Fujitsu S-600 for comparison

Caltech Authors

Publikationsserver der RWTH Aachen University

Parallel algorithms for simulating continuous time Markov chains

Author: Heidelberger Philip
Nicol David M.
Publication venue
Publication date
Field of study

We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors

NASA Technical Reports Server

NASA high performance computing and communications program

Author: Holcomb Lee
Hunter Paul
Smith Paul
Publication venue
Publication date
Field of study

The National Aeronautics and Space Administration's HPCC program is part of a new Presidential initiative aimed at producing a 1000-fold increase in supercomputing speed and a 100-fold improvement in available communications capability by 1997. As more advanced technologies are developed under the HPCC program, they will be used to solve NASA's 'Grand Challenge' problems, which include improving the design and simulation of advanced aerospace vehicles, allowing people at remote locations to communicate more effectively and share information, increasing scientist's abilities to model the Earth's climate and forecast global environmental trends, and improving the development of advanced spacecraft. NASA's HPCC program is organized into three projects which are unique to the agency's mission: the Computational Aerosciences (CAS) project, the Earth and Space Sciences (ESS) project, and the Remote Exploration and Experimentation (REE) project. An additional project, the Basic Research and Human Resources (BRHR) project exists to promote long term research in computer science and engineering and to increase the pool of trained personnel in a variety of scientific disciplines. This document presents an overview of the objectives and organization of these projects as well as summaries of individual research and development programs within each project

NASA Technical Reports Server

Nonlinear structural response using adaptive dynamic relaxation on a massively-parallel-processing system

Author: Knight Norman F., Jr.
Oakley David R.
Publication venue
Publication date
Field of study

A parallel adaptive dynamic relaxation (ADR) algorithm has been developed for nonlinear structural analysis. This algorithm has minimal memory requirements, is easily parallelizable and scalable to many processors, and is generally very reliable and efficient for highly nonlinear problems. Performance evaluations on single-processor computers have shown that the ADR algorithm is reliable and highly vectorizable, and that it is competitive with direct solution methods for the highly nonlinear problems considered. The present algorithm is implemented on the 512-processor Intel Touchstone DELTA system at Caltech, and it is designed to minimize the extent and frequency of interprocessor communication. The algorithm has been used to solve for the nonlinear static response of two and three dimensional hyperelastic systems involving contact. Impressive relative speedups have been achieved and demonstrate the high scalability of the ADR algorithm. For the class of problems addressed, the ADR algorithm represents a very promising approach for parallel-vector processing

NASA Technical Reports Server

Runtime Support for In-Core and Out-of-Core Data-Parallel Programs

Author: Thakur Rajeev
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1995
Field of study

Distributed memory parallel computers or distributed computer systems are widely recognized as the only cost-effective means of achieving teraflops performance in the near future. However, the fact remains that they are difficult to program and advances in software for these machines have not kept pace with advances in hardware. This thesis addresses several issues in providing runtime support for in-core as well as out-of-core programs on distributed memory parallel computers. This runtime support can be directly used in application programs for greater efficiency, portability and ease of programming. It can also be used together with a compiler to translate programs written in a high-level data-parallel language like High Performance Fortran (HPF) to node programs for distributed memory machines. In distributed memory programs, it is often necessary to change the distribution of arrays during program execution. This thesis presents efficient and portable algorithms for runtime array redistribution. The algorithms have been implemented on the Intel Touchstone Delta and are found to scale well with the number of processors and array size. This thesis also presents algorithms for all to all collective communication on fat tree and two dimensional mesh interconnection topologies. The performance of these algorithms on the CM 5 and Touchstone Delta is studied extensively. A model for estimating the time taken by these algorithms on the basis of system parameters is developed and validated by comparing with experimental results. A number of applications deal with very large data sets which cannot fit in main memory, and hence have to be stored in files on disks, resulting in out of core programs. This thesis also describes the design and implementation of efficient runtime support for out of core computations. Several optimizations for accessing out of core data are presented. An extended Two Phase Method is proposed for accessing sections of out of core arrays efficiently. This method uses collective I/O and the I/O workload is divided among processors dynamically, depending on the access requests. Performance results obtained using this runtime support for out of core programs on the Touchstone Delta are presented

CiteSeerX

Syracuse University Research Facility and Collaborative Environment

Recommended from our members

Nonlinear explicit transient finite element analysis on the Intel Delta

Author: Gupta S.
Plaskacz E. J.
Ramirez M. R.
Publication venue: Argonne National Laboratory
Publication date: 01/03/1993
Field of study

Many large scale finite element problems are intractable on current generation production supercomputers. High-performance computer architectures offer effective avenues to bridge the gap between computational needs and the power of computational hardware. The biggest challenge lies in the substitution of the key algorithms in an application program with redesigned algorithms which exploit the new architectures and use better or more appropriate numerical techniques. A methodology for implementing nonlinear finite element analysis on a homogeneous distributed processing network is discussed. The method can also be extended to heterogeneous networks comprised of different machine architectures provided that they have a mutual communication interface. This unique feature has greatly facilitated the port of the code to the 8-node Intel Touchstone Gamma and then the 512-node Intel Touchstone Delta. The domain is decomposed serially in a preprocessor. Separate input files are written for each subdomain. These files are read in by local copies of the program executable operating in parallel. Communication between processors is addressed utilizing asynchronous and synchronous message passing. The basic kernel of message passing is the internal force exchange which is analogous to the computed interactions between sections of physical bodies in static stress analysis. Benchmarks for the Intel Delta are presented. Performance exceeding 1 gigaflop was attained. Results for two large-scale finite element meshes are presented

UNT Digital Library

Complete Exchange on a Wormhole Routed Mesh

Author: Choudhary Alok
Fox Geoffrey C.
Thakur Rajeev
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1993
Field of study

The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many important parallel computing applications. We discuss several algorithms to perform complete exchange on a two dimensional mesh connected computer with wormhole routing. We propose algorithms for both powerof -two and non power-of-two meshes as well as an algorithm which works for any arbitrary mesh. We have developed analytical models to estimate the performance of the algorithms on the basis of system parameters. These models take into account the effects of link contention and other characteristics of the communication system. Performance results on the Intel Touchstone Delta are presented and analyzed

Syracuse University Research Facility and Collaborative Environment

Stetson Collegiate, Vol. 27, No. 4, October 30, 1914

Author: Stetson University
Publication venue: John B. Stetson University
Publication date: 30/10/1914
Field of study

Stetson University student newspaper.https://stars.library.ucf.edu/cfm-stetsoncollegiate/1081/thumbnail.jp

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)