53 research outputs found
Summary of research in applied mathematics, numerical analysis, and computer sciences
The major categories of current ICASE research programs addressed include: numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; control and parameter identification problems, with emphasis on effective numerical methods; computational problems in engineering and physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and computer systems and software, especially vector and parallel computers
Nodal Discontinuous Galerkin Methods on Graphics Processors
Discontinuous Galerkin (DG) methods for the numerical solution of partial
differential equations have enjoyed considerable success because they are both
flexible and robust: They allow arbitrary unstructured geometries and easy
control of accuracy without compromising simulation stability. Lately, another
property of DG has been growing in importance: The majority of a DG operator is
applied in an element-local way, with weak penalty-based element-to-element
coupling.
The resulting locality in memory access is one of the factors that enables DG
to run on off-the-shelf, massively parallel graphics processors (GPUs). In
addition, DG's high-order nature lets it require fewer data points per
represented wavelength and hence fewer memory accesses, in exchange for higher
arithmetic intensity. Both of these factors work significantly in favor of a
GPU implementation of DG.
Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for
Maxwell's equations on a general 3D unstructured grid by a factor of 40 to 60
relative to a serial computation on a current-generation CPU. In many cases,
our algorithms exhibit full use of the device's available memory bandwidth.
Example computations achieve and surpass 200 gigaflops/s of net
application-level floating point work.
In this article, we describe and derive the techniques used to reach this
level of performance. In addition, we present comprehensive data on the
accuracy and runtime behavior of the method.Comment: 33 pages, 12 figures, 4 table
[Activity of Institute for Computer Applications in Science and Engineering]
This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science
Asynchronous and Multiprecision Linear Solvers - Scalable and Fault-Tolerant Numerics for Energy Efficient High Performance Computing
Asynchronous methods minimize idle times by removing synchronization barriers, and therefore allow the efficient usage of computer systems. The implied high tolerance with respect to communication latencies improves the fault tolerance. As asynchronous methods also enable the usage of the power and energy saving mechanisms provided by the hardware, they are suitable candidates for the highly parallel and heterogeneous hardware platforms that are expected for the near future
The Sixth Copper Mountain Conference on Multigrid Methods, part 1
The Sixth Copper Mountain Conference on Multigrid Methods was held on 4-9 Apr. 1993, at Copper Mountain, CO. This book is a collection of many of the papers presented at the conference and as such represents the conference proceedings. NASA LaRC graciously provided printing of this document so that all of the papers could be presented in a single forum. Each paper was reviewed by a member of the conference organizing committee under the coordination of the editors. The multigrid discipline continues to expand and mature, as is evident from these proceedings. The vibrancy in this field is amply expressed in these important papers, and the collection clearly shows its rapid trend to further diversity and depth
Synthetic presentation of iterative asynchronous parallel algorithms.
Iterative asynchronous parallel methods are nowadays gaining renewed interest in the community of researchers interested in High Performance Computing (HPC), in the specific case of massive parallelism. This is because these methods avoid the deadlock phenomena and that moreover a rigorous load balancing is not necessary, which is not the case with synchronous methods. Such iterative asynchronous parallel methods are of great interest when there are many synchronizations between processors, which in the case of iterative methods is the case when convergence is slow. Indeed in iterative synchronous parallel methods, to respect the task sequence graph that defines in fact the logic of the algorithm used, processors must wait for the results they need and calculated by other processors; such expectations of the results emitted by concurrent processors therefore cause idle times for standby processors. It is to overcome this drawback that asynchronous parallel iterative methods have been introduced first for the resolution of large scale linear systems and then for the resolution of highly nonlinear algebraic systems of large size as well, where the solution may be subject to constraints. This kind of method has been widely studied worldwide by many authors. The purpose of this presentation is to present as broadly and pedagogically as possible the asynchronous parallel iterative methods as well as the issues related to their implementation and application in solving many problems arising from High Performance Computing. We will therefore try as much as possible to present the underlying concepts that allow a good understanding of these methods by avoiding as much as possible an overly rigorous mathematical formalism; references to the main pioneering work will also be made. After a general introduction we will present the basic concepts that allow to model asynchronous parallel iterative methods including as a particular case synchronous methods. We will then present the algorithmic extensions of these methods consisting of asynchronous sub-domain methods, asynchronous multisplitting methods as well as asynchronous parallel methods with flexible communications. In each case an analysis of the behavior of these methods will be presented. Note that the first kind of analysis allows to obtain an estimate of the asymptotic rate of convergence. The difficult problem of the stopping test of asynchronous parallel iterations will be also studied, both by computer sciences considerations and also by numerical aspects related to the mathematical analysis of the behavior of theses iterative parallel methods. The parallel asynchronous methods have been implemented on various architectures and we will present the main principles that made it possible to code them. These parallel asynchronous methods have been used for the resolution of several kind of mathematical problems and we will list the main applications processed. Finally we will try to specify in which cases and on which type of architecture these methods are efficient and interesting to use
Nodal discontinuous Galerkin methods on graphics processors
Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in an element-local way, with weak penalty-based element-to-element coupling. The resulting locality in memory access is one of the factors that enables DG to run on off-the-shelf, massively parallel graphics processors (GPUs). In addition, DG's high-order nature lets it require fewer data points per represented wavelength and hence fewer memory accesses, in exchange for higher arithmetic intensity. Both of these factors work significantly in favor of a GPU implementation of DG. Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for Maxwell's equations on a general 3D unstructured grid by a factor of around 50 relative to a serial computation on a current-generation CPU. In many cases, our algorithms exhibit full use of the device's available memory bandwidth. Example computations achieve and surpass 200 gigaflops/s of net application-level floating point work. In this article, we describe and derive the techniques used to reach this level of performance. In addition, we present comprehensive data on the accuracy and runtime behavior of the method. (C) 2009 Elsevier Inc. All rights reserved
Seventh Copper Mountain Conference on Multigrid Methods
The Seventh Copper Mountain Conference on Multigrid Methods was held on 2-7 Apr. 1995 at Copper Mountain, Colorado. This book is a collection of many of the papers presented at the conference and so represents the conference proceedings. NASA Langley graciously provided printing of this document so that all of the papers could be presented in a single forum. Each paper was reviewed by a member of the conference organizing committee under the coordination of the editors. The multigrid discipline continues to expand and mature, as is evident from these proceedings. The vibrancy in this field is amply expressed in these important papers, and the collection shows its rapid trend to further diversity and depth
- …