Search CORE

5,357 research outputs found

Highly parallel sparse Cholesky factorization

Author: Gilbert John R.
Schreiber Robert
Publication venue
Publication date
Field of study

Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms

NASA Technical Reports Server

Solution of partial differential equations on vector and parallel computers

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

NASA Technical Reports Server

Benchmarking the computation and communication performance of the CM-5

Author: Geoffrey Fox
Kivanc Dincer
Sanjay Ranka
Zeki Bozkus
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Crossref

Group implicit concurrent algorithms in nonlinear structural dynamics

Author: Ortiz M.
Sotelino E. D.
Publication venue
Publication date
Field of study

During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers

NASA Technical Reports Server

Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications

Author: Dietzfelbinger Martin
Walzer Stefan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

In this paper we identify a new class of sparse near-quadratic random Boolean matrices that have full row rank over F_2 = {0,1} with high probability and can be transformed into echelon form in almost linear time by a simple version of Gauss elimination. The random matrix with dimensions n(1-epsilon) x n is generated as follows: In each row, identify a block of length L = O((log n)/epsilon) at a random position. The entries outside the block are 0, the entries inside the block are given by fair coin tosses. Sorting the rows according to the positions of the blocks transforms the matrix into a kind of band matrix, on which, as it turns out, Gauss elimination works very efficiently with high probability. For the proof, the effects of Gauss elimination are interpreted as a ("coin-flipping") variant of Robin Hood hashing, whose behaviour can be captured in terms of a simple Markov model from queuing theory. Bounds for expected construction time and high success probability follow from results in this area. They readily extend to larger finite fields in place of F_2. By employing hashing, this matrix family leads to a new implementation of a retrieval data structure, which represents an arbitrary function f: S -> {0,1} for some set S of m = (1-epsilon)n keys. It requires m/(1-epsilon) bits of space, construction takes O(m/epsilon^2) expected time on a word RAM, while queries take O(1/epsilon) time and access only one contiguous segment of O((log m)/epsilon) bits in the representation (O(1/epsilon) consecutive words on a word RAM). The method is readily implemented and highly practical, and it is competitive with state-of-the-art methods. In a more theoretical variant, which works only for unrealistically large S, we can even achieve construction time O(m/epsilon) and query time O(1), accessing O(1) contiguous memory words for a query. By well-established methods the retrieval data structure leads to efficient constructions of (static) perfect hash functions and (static) Bloom filters with almost optimal space and very local storage access patterns for queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Digitale Bibliothek Thüringen

Design of a sparse vector processor

Author: Hopkins T.M.
Publication venue: The University of Edinburgh
Publication date: 01/01/1993
Field of study

Edinburgh Research Archive

Computing with functions in spherical and polar geometries I. The sphere

Author: Townsend Alex
Wilber Heather
Wright Grady B.
Publication venue
Publication date: 01/01/2016
Field of study

A collection of algorithms is described for numerically computing with smooth functions defined on the unit sphere. Functions are approximated to essentially machine precision by using a structure-preserving iterative variant of Gaussian elimination together with the double Fourier sphere method. We show that this procedure allows for stable differentiation, reduces the oversampling of functions near the poles, and converges for certain analytic functions. Operations such as function evaluation, differentiation, and integration are particularly efficient and can be computed by essentially one-dimensional algorithms. A highlight is an optimal complexity direct solver for Poisson's equation on the sphere using a spectral method. Without parallelization, we solve Poisson's equation with

100

million degrees of freedom in one minute on a standard laptop. Numerical results are presented throughout. In a companion paper (part II) we extend the ideas presented here to computing with functions on the disk.Comment: 23 page

arXiv.org e-Print Archive

Boise State University - ScholarWorks

Approximate and Incomplete Factorizations

Author: Chan T.F.
Vorst H.A. van der
Publication venue
Publication date: 01/01/1997
Field of study

In this chapter, we give a brief overview of a particular class of preconditioners known as incomplete factorizations. They can be thought of as approximating the exact LU factorization of a given matrix A (e.g. computed via Gaussian elimination) by disallowing certain ll-ins. As opposed to other PDE-based preconditioners such as multigrid and domain decomposition, this class of preconditioners are primarily algebraic in nature and can in principle be applied to any sparse matrices. When applied to PDE problems, they are usually not optimal in the sense that the condition number of the preconditioned system will grow as the mesh size h is reduced, although usually at a slower rate than for the unpreconditioned system. On the other hand, they are often quite robust with respect to other more algebraic features of the problem such as rough and anisotropic coecients and strong convection terms. We will describe the basic ILU and (modied) MILU preconditioners. Then we will review brie y several variants: more lls, relaxed ILU, shifted ILU, ILQ, as well as block and multilevel variants. We will also touch on a related class of approximate factorization methods which arise more directly from approximating a partial dierential operator by a product of simpler operators. Finally, we will discuss parallelization aspects, including re-ordering, series expansion and domain decomposition techniques. Generally, this class of preconditioner does not possess a high degree of parallelism in its original form. Re-ordering and approximations by truncating certain series expansion will increase the parallelism, but usually with a deterioration in convergence rate. Domain decomposition oers a compromise

Utrecht University Repository

Parallel Computers and Complex Systems

Author: G.C. Fox
P.D. Coddington
Publication venue: University Press
Publication date
Field of study

We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines

CiteSeerX