76 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Multi-Dimensional Astrophysical Structural and Dynamical Analysis I. Development of a Nonlinear Finite Element Approach

    Full text link
    A new field of numerical astrophysics is introduced which addresses the solution of large, multidimensional structural or slowly-evolving problems (rotating stars, interacting binaries, thick advective accretion disks, four dimensional spacetimes, etc.). The technique employed is the Finite Element Method (FEM), commonly used to solve engineering structural problems. The approach developed herein has the following key features: 1. The computational mesh can extend into the time dimension, as well as space, perhaps only a few cells, or throughout spacetime. 2. Virtually all equations describing the astrophysics of continuous media, including the field equations, can be written in a compact form similar to that routinely solved by most engineering finite element codes. 3. The transformations that occur naturally in the four-dimensional FEM possess both coordinate and boost features, such that (a) although the computational mesh may have a complex, non-analytic, curvilinear structure, the physical equations still can be written in a simple coordinate system independent of the mesh geometry. (b) if the mesh has a complex flow velocity with respect to coordinate space, the transformations will form the proper arbitrary Lagrangian- Eulerian advective derivatives automatically. 4. The complex difference equations on the arbitrary curvilinear grid are generated automatically from encoded differential equations. This first paper concentrates on developing a robust and widely-applicable set of techniques using the nonlinear FEM and presents some examples.Comment: 28 pages, 9 figures; added integral boundary conditions, allowing very rapidly-rotating stars; accepted for publication in Ap.

    Trivariate polynomial approximation on Lissajous curves

    Get PDF
    We study Lissajous curves in the 3-cube, that generate algebraic cubature formulas on a special family of rank-1 Chebyshev lattices. These formulas are used to construct trivariate hyperinterpolation polynomials via a single 1-d Fast Chebyshev Transform (by the Chebfun package), and to compute discrete extremal sets of Fekete and Leja type for trivariate polynomial interpolation. Applications could arise in the framework of Lissajous sampling for MPI (Magnetic Particle Imaging)

    Efficient Generation of Correctness Certificates for the Abstract Domain of Polyhedra

    Full text link
    Polyhedra form an established abstract domain for inferring runtime properties of programs using abstract interpretation. Computations on them need to be certified for the whole static analysis results to be trusted. In this work, we look at how far we can get down the road of a posteriori verification to lower the overhead of certification of the abstract domain of polyhedra. We demonstrate methods for making the cost of inclusion certificate generation negligible. From a performance point of view, our single-representation, constraints-based implementation compares with state-of-the-art implementations

    Communication and Matrix Computations on Large Message Passing Systems

    Get PDF
    This paper is concerned with the consequences for matrix computations of having a rather large number of general purpose processors, say ten or twenty thousand, connected in a network in such a way that a processor can communicate only with its immediate neighbors. Certain communication tasks associated with most matrix algorithms are defined and formulas developed for the time required to perform them under several communication regimes. The results are compared with the times for a nominal n3n^3 floating point operations. The results suggest that it is possible to use a large number of processors to solve matrix problems at a relatively fine granularity, provided fine grain communication is available. Additional figures are available at ftp thales.cs.umd.edu in the directory pub/reports (Also cross-referenced as UMIACS-TR-88-81

    Hypercube matrix computation task

    Get PDF
    A major objective of the Hypercube Matrix Computation effort at the Jet Propulsion Laboratory (JPL) is to investigate the applicability of a parallel computing architecture to the solution of large-scale electromagnetic scattering problems. Three scattering analysis codes are being implemented and assessed on a JPL/California Institute of Technology (Caltech) Mark 3 Hypercube. The codes, which utilize different underlying algorithms, give a means of evaluating the general applicability of this parallel architecture. The three analysis codes being implemented are a frequency domain method of moments code, a time domain finite difference code, and a frequency domain finite elements code. These analysis capabilities are being integrated into an electromagnetics interactive analysis workstation which can serve as a design tool for the construction of antennas and other radiating or scattering structures. The first two years of work on the Hypercube Matrix Computation effort is summarized. It includes both new developments and results as well as work previously reported in the Hypercube Matrix Computation Task: Final Report for 1986 to 1987 (JPL Publication 87-18)

    A Computational Model for Tensor Core Units

    Full text link
    To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature of these architectures is a hardware circuit for efficiently computing a dense matrix multiplication of a given small size. In order to broaden the class of algorithms that exploit these systems, we propose a computational model, named the TCU model, that captures the ability to natively multiply small matrices. We then use the TCU model for designing fast algorithms for several problems, including matrix operations (dense and sparse multiplication, Gaussian Elimination), graph algorithms (transitive closure, all pairs shortest distances), Discrete Fourier Transform, stencil computations, integer multiplication, and polynomial evaluation. We finally highlight a relation between the TCU model and the external memory model

    Scalable Task-Based Algorithm for Multiplication of Block-Rank-Sparse Matrices

    Full text link
    A task-based formulation of Scalable Universal Matrix Multiplication Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is applied to the multiplication of hierarchy-free, rank-structured matrices that appear in the domain of quantum chemistry (QC). The novel features of our formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and (2) fine-grained task-based composition. These features make it tolerant of the load imbalance due to the irregular matrix structure and eliminate all artifactual sources of global synchronization.Scalability of iterative computation of square-root inverse of block-rank-sparse QC matrices is demonstrated; for full-rank (dense) matrices the performance of our SUMMA formulation usually exceeds that of the state-of-the-art dense MM implementations (ScaLAPACK and Cyclops Tensor Framework).Comment: 8 pages, 6 figures, accepted to IA3 2015. arXiv admin note: text overlap with arXiv:1504.0504
    corecore