25 research outputs found
HPCCP/CAS Workshop Proceedings 1998
This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey
The 1998 Center for Simulation of Dynamic Response in Materials Annual Technical Report
Introduction:
This annual report describes research accomplishments for FY 98 of the Center for Simulation
of Dynamic Response of Materials. The Center is constructing a virtual shock physics facility
in which the full three dimensional response of a variety of target materials can be computed
for a wide range of compressive, tensional, and shear loadings, including those produced by
detonation of energetic materials. The goals are to facilitate computation of a variety of
experiments in which strong shock and detonation waves are made to impinge on targets
consisting of various combinations of materials, compute the subsequent dynamic response
of the target materials, and validate these computations against experimental data
The 1999 Center for Simulation of Dynamic Response in Materials Annual Technical Report
Introduction:
This annual report describes research accomplishments for FY 99 of the Center
for Simulation of Dynamic Response of Materials. The Center is constructing a
virtual shock physics facility in which the full three dimensional response of a
variety of target materials can be computed for a wide range of compressive, ten-
sional, and shear loadings, including those produced by detonation of energetic
materials. The goals are to facilitate computation of a variety of experiments
in which strong shock and detonation waves are made to impinge on targets
consisting of various combinations of materials, compute the subsequent dy-
namic response of the target materials, and validate these computations against
experimental data
RIACS
The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on June 6, 1983. RIACS is privately operated by USRA, a consortium of universities that serves as a bridge between NASA and the academic community. Under a five-year co-operative agreement with NASA, research at RIACS is focused on areas that are strategically enabling to the Ames Research Center's role as NASA's Center of Excellence for Information Technology. The primary mission of RIACS is charted to carry out research and development in computer science. This work is devoted in the main to tasks that are strategically enabling with respect to NASA's bold mission in space exploration and aeronautics. There are three foci for this work: (1) Automated Reasoning. (2) Human-Centered Computing. and (3) High Performance Computing and Networking. RIACS has the additional goal of broadening the base of researcher in these areas of importance to the nation's space and aeronautics enterprises. Through its visiting scientist program, RIACS facilitates the participation of university-based researchers, including both faculty and students, in the research activities of NASA and RIACS. RIACS researchers work in close collaboration with NASA computer scientists on projects such as the Remote Agent Experiment on Deep Space One mission, and Super-Resolution Surface Modeling
Predictive analysis and optimisation of pipelined wavefront applications using reusable analytic models
Pipelined wavefront computations are an ubiquitous class of high performance parallel algorithms
used for the solution of many scientific and engineering applications. In order to aid
the design and optimisation of these applications, and to ensure that during procurement platforms
are chosen best suited to these codes, there has been considerable research in analysing
and evaluating their operational performance.
Wavefront codes exhibit complex computation, communication, synchronisation patterns,
and as a result there exist a large variety of such codes and possible optimisations. The
problem is compounded by each new generation of high performance computing system,
which has often introduced a previously unexplored architectural trait, requiring previous
performance models to be rewritten and reevaluated.
In this thesis, we address the performance modelling and optimisation of this class of
application, as a whole. This differs from previous studies in which bespoke models are applied
to specific applications. The analytic performance models are generalised and reusable,
and we demonstrate their application to the predictive analysis and optimisation of pipelined
wavefront computations running on modern high performance computing systems.
The performance model is based on the LogGP parameterisation, and uses a small
number of input parameters to specify the particular behaviour of most wavefront codes. The
new parameters and model equations capture the key structural and behavioural differences
among different wavefront application codes, providing a succinct summary of the operations
for each application and insights into alternative wavefront application design.
The models are applied to three industry-strength wavefront codes and are validated
on several systems including a Cray XT3/XT4 and an InfiniBand commodity cluster. Model
predictions show high quantitative accuracy (less than 20% error) for all high performance
configurations and excellent qualitative accuracy.
The thesis presents applications, projections and insights for optimisations using the
model, which show the utility of reusable analytic models for performance engineering of
high performance computing codes. In particular, we demonstrate the use of the model for:
(1) evaluating application configuration and resulting performance; (2) evaluating hardware
platform issues including platform sizing, configuration; (3) exploring hardware platform design
alternatives and system procurement and, (4) considering possible code and algorithmic
optimisations
ATCOM: Automatically tuned collective communication system for SMP clusters.
Conventional implementations of collective communications are based on point-to-point communications, and their optimizations have been focused on efficiency of those communication algorithms. However, point-to-point communications are not the optimal choice for modern computing clusters of SMPs due to their two-level communication structure. In recent years, a few research efforts have investigated efficient collective communications for SMP clusters. This dissertation is focused on platform-independent algorithms and implementations in this area;There are two main approaches to implementing efficient collective communications for clusters of SMPs: using shared memory operations for intra-node communications, and over-lapping inter-node/intra-node communications. The former fully utilizes the hardware based shared memory of an SMP, and the latter takes advantage of the inherent hierarchy of the communications within a cluster of SMPs. Previous studies focused on clusters of SMP from certain vendors. However, the previously proposed methods are not portable to other systems. Because the performance optimization issue is very complicated and the developing process is very time consuming, it is highly desired to have self-tuning, platform-independent implementations. As proven in this dissertation, such an implementation can significantly outperform the other point-to-point based portable implementations and some platform-specific implementations;The dissertation describes in detail the architecture of the platform-independent implementation. There are four system components: shared memory-based collective communications, overlapping mechanisms for inter-node and intra-node communications, a prediction-based tuning module and a micro-benchmark based tuning module. Each component is carefully designed with the goal of automatic tuning in mind
Predictive analysis and optimisation of pipelined wavefront applications using reusable analytic models
Pipelined wavefront computations are an ubiquitous class of high performance parallel algorithms used for the solution of many scientific and engineering applications. In order to aid the design and optimisation of these applications, and to ensure that during procurement platforms are chosen best suited to these codes, there has been considerable research in analysing and evaluating their operational performance. Wavefront codes exhibit complex computation, communication, synchronisation patterns, and as a result there exist a large variety of such codes and possible optimisations. The problem is compounded by each new generation of high performance computing system, which has often introduced a previously unexplored architectural trait, requiring previous performance models to be rewritten and reevaluated. In this thesis, we address the performance modelling and optimisation of this class of application, as a whole. This differs from previous studies in which bespoke models are applied to specific applications. The analytic performance models are generalised and reusable, and we demonstrate their application to the predictive analysis and optimisation of pipelined wavefront computations running on modern high performance computing systems. The performance model is based on the LogGP parameterisation, and uses a small number of input parameters to specify the particular behaviour of most wavefront codes. The new parameters and model equations capture the key structural and behavioural differences among different wavefront application codes, providing a succinct summary of the operations for each application and insights into alternative wavefront application design. The models are applied to three industry-strength wavefront codes and are validated on several systems including a Cray XT3/XT4 and an InfiniBand commodity cluster. Model predictions show high quantitative accuracy (less than 20% error) for all high performance configurations and excellent qualitative accuracy. The thesis presents applications, projections and insights for optimisations using the model, which show the utility of reusable analytic models for performance engineering of high performance computing codes. In particular, we demonstrate the use of the model for: (1) evaluating application configuration and resulting performance; (2) evaluating hardware platform issues including platform sizing, configuration; (3) exploring hardware platform design alternatives and system procurement and, (4) considering possible code and algorithmic optimisations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Recommended from our members
Scalability of preconditioners as a strategy for parallel computation of compressible fluid flow
Parallel implementations of a Newton-Krylov-Schwarz algorithm are used to solve a model problem representing low Mach number compressible fluid flow over a backward-facing step. The Mach number is specifically selected to result in a numerically {open_quote}stiff{close_quotes} matrix problem, based on an implicit finite volume discretization of the compressible 2D Navier-Stokes/energy equations using primitive variables. Newton`s method is used to linearize the discrete system, and a preconditioned Krylov projection technique is used to solve the resulting linear system. Domain decomposition enables the development of a global preconditioner via the parallel construction of contributions derived from subdomains. Formation of the global preconditioner is based upon additive and multiplicative Schwarz algorithms, with and without subdomain overlap. The degree of parallelism of this technique is further enhanced with the use of a matrix-free approximation for the Jacobian used in the Krylov technique (in this case, GMRES(k)). Of paramount interest to this study is the implementation and optimization of these techniques on parallel shared-memory hardware, namely the Cray C90 and SGI Challenge architectures. These architectures were chosen as representative and commonly available to researchers interested in the solution of problems of this type. The Newton-Krylov-Schwarz solution technique is increasingly being investigated for computational fluid dynamics (CFD) applications due to the advantages of full coupling of all variables and equations, rapid non-linear convergence, and moderate memory requirements. A parallel version of this method that scales effectively on the above architectures would be extremely attractive to practitioners, resulting in efficient, cost-effective, parallel solutions exhibiting the benefits of the solution technique