76 research outputs found
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Recommended from our members
Index Transformation Algorithms in a Linear Algebra Framework
We present a linear algebraic formulation for a class of index transformations such as Gray code encoding and decoding, matrix transpose, bit reversal, vector reversal, shuffles, and other index or dimension permutations. This formulation unifies, simplifies, and can be used to derive algorithms for hypercube multiprocessors. We show how all the widely known properties of Gray codes, and some not so well-known properties as well, can be derived using this framework. Using this framework, we relate hypercube communications algorithms to Gauss-Jordan elimination on a matrix of 0's and 1's.Engineering and Applied Science
Multi-Dimensional Astrophysical Structural and Dynamical Analysis I. Development of a Nonlinear Finite Element Approach
A new field of numerical astrophysics is introduced which addresses the
solution of large, multidimensional structural or slowly-evolving problems
(rotating stars, interacting binaries, thick advective accretion disks, four
dimensional spacetimes, etc.). The technique employed is the Finite Element
Method (FEM), commonly used to solve engineering structural problems. The
approach developed herein has the following key features:
1. The computational mesh can extend into the time dimension, as well as
space, perhaps only a few cells, or throughout spacetime.
2. Virtually all equations describing the astrophysics of continuous media,
including the field equations, can be written in a compact form similar to that
routinely solved by most engineering finite element codes.
3. The transformations that occur naturally in the four-dimensional FEM
possess both coordinate and boost features, such that
(a) although the computational mesh may have a complex, non-analytic,
curvilinear structure, the physical equations still can be written in a simple
coordinate system independent of the mesh geometry.
(b) if the mesh has a complex flow velocity with respect to coordinate space,
the transformations will form the proper arbitrary Lagrangian- Eulerian
advective derivatives automatically.
4. The complex difference equations on the arbitrary curvilinear grid are
generated automatically from encoded differential equations.
This first paper concentrates on developing a robust and widely-applicable
set of techniques using the nonlinear FEM and presents some examples.Comment: 28 pages, 9 figures; added integral boundary conditions, allowing
very rapidly-rotating stars; accepted for publication in Ap.
Trivariate polynomial approximation on Lissajous curves
We study Lissajous curves in the 3-cube, that generate algebraic cubature
formulas on a special family of rank-1 Chebyshev lattices. These formulas are
used to construct trivariate hyperinterpolation polynomials via a single 1-d
Fast Chebyshev Transform (by the Chebfun package), and to compute discrete
extremal sets of Fekete and Leja type for trivariate polynomial interpolation.
Applications could arise in the framework of Lissajous sampling for MPI
(Magnetic Particle Imaging)
Efficient Generation of Correctness Certificates for the Abstract Domain of Polyhedra
Polyhedra form an established abstract domain for inferring runtime
properties of programs using abstract interpretation. Computations on them need
to be certified for the whole static analysis results to be trusted. In this
work, we look at how far we can get down the road of a posteriori verification
to lower the overhead of certification of the abstract domain of polyhedra. We
demonstrate methods for making the cost of inclusion certificate generation
negligible. From a performance point of view, our single-representation,
constraints-based implementation compares with state-of-the-art
implementations
Communication and Matrix Computations on Large Message Passing Systems
This paper is concerned with the consequences for matrix computations
of having a rather large number of general purpose processors, say
ten or twenty thousand, connected in a network in such a way that a
processor can communicate only with its immediate neighbors. Certain
communication tasks associated with most matrix algorithms are
defined and formulas developed for the time required to perform them
under several communication regimes. The results are compared with
the times for a nominal floating point operations. The results
suggest that it is possible to use a large number of processors to
solve matrix problems at a relatively fine granularity, provided fine
grain communication is available.
Additional figures are available at ftp thales.cs.umd.edu in
the directory pub/reports
(Also cross-referenced as UMIACS-TR-88-81
Hypercube matrix computation task
A major objective of the Hypercube Matrix Computation effort at the Jet Propulsion Laboratory (JPL) is to investigate the applicability of a parallel computing architecture to the solution of large-scale electromagnetic scattering problems. Three scattering analysis codes are being implemented and assessed on a JPL/California Institute of Technology (Caltech) Mark 3 Hypercube. The codes, which utilize different underlying algorithms, give a means of evaluating the general applicability of this parallel architecture. The three analysis codes being implemented are a frequency domain method of moments code, a time domain finite difference code, and a frequency domain finite elements code. These analysis capabilities are being integrated into an electromagnetics interactive analysis workstation which can serve as a design tool for the construction of antennas and other radiating or scattering structures. The first two years of work on the Hypercube Matrix Computation effort is summarized. It includes both new developments and results as well as work previously reported in the Hypercube Matrix Computation Task: Final Report for 1986 to 1987 (JPL Publication 87-18)
A Computational Model for Tensor Core Units
To respond to the need of efficient training and inference of deep neural
networks, a plethora of domain-specific hardware architectures have been
introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A
common feature of these architectures is a hardware circuit for efficiently
computing a dense matrix multiplication of a given small size. In order to
broaden the class of algorithms that exploit these systems, we propose a
computational model, named the TCU model, that captures the ability to natively
multiply small matrices. We then use the TCU model for designing fast
algorithms for several problems, including matrix operations (dense and sparse
multiplication, Gaussian Elimination), graph algorithms (transitive closure,
all pairs shortest distances), Discrete Fourier Transform, stencil
computations, integer multiplication, and polynomial evaluation. We finally
highlight a relation between the TCU model and the external memory model
Scalable Task-Based Algorithm for Multiplication of Block-Rank-Sparse Matrices
A task-based formulation of Scalable Universal Matrix Multiplication
Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is
applied to the multiplication of hierarchy-free, rank-structured matrices that
appear in the domain of quantum chemistry (QC). The novel features of our
formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and
(2) fine-grained task-based composition. These features make it tolerant of the
load imbalance due to the irregular matrix structure and eliminate all
artifactual sources of global synchronization.Scalability of iterative
computation of square-root inverse of block-rank-sparse QC matrices is
demonstrated; for full-rank (dense) matrices the performance of our SUMMA
formulation usually exceeds that of the state-of-the-art dense MM
implementations (ScaLAPACK and Cyclops Tensor Framework).Comment: 8 pages, 6 figures, accepted to IA3 2015. arXiv admin note: text
overlap with arXiv:1504.0504
- …