266 research outputs found
Numerical solution for the minimum norm solution to the first kind integral equation with a special kernel and efficient implementations of the Cholesky factorization algorithm on the vector and parallel supercomputers
Part I. Let K: L[subscript]2[a,b] → L[subscript]2[c,d] be a bounded linear operator defined by (Kf)(x)=ϵt[subscript]spabk(x,y)f(y)dy, where kϵ L[subscript]2([c,d]x[a,b]) and fϵ L[subscript]2[a,b]. Define k[subscript]x by k[subscript]x(y) = k(x,y). Assume K has the property that (a) k[subscript]xϵ L[subscript]2[a,b] for all xϵ[c,d] and (b) Kf = 0 a.e. implies (Kf)(x) = 0 for all xϵ[c,d]. Then, it is shown that the minimum norm solution f[subscript]0 to the first kind of Fredholm integral equation Kf = g is the L[subscript]2-norm limit of linear combinations of the k[subscript]x\u27s. Next, it is shown how to choose constants c[subscript]1, c[subscript]2, ·s, c[subscript]n to minimize ǁ f[subscript]0-[sigma][subscript]spj=1nc[subscript]jk[subscript]x[subscript] j ǁ [subscript]2 for n fixed points x[subscript]1, x[subscript]2, ·s, x[subscript]n [underline]without knowing what f[subscript]0 is. Perturbation results and some characteristics of this approximate solution f[subscript]n = [sigma][subscript]spj=1nc[subscript]jk[subscript]x[subscript] j for f[subscript]0 are presented;This paper also contains a numerical method choosing n points x[subscript]1, x[subscript]2, ·s, x[subscript]n at which ǁ f[subscript]0- [sigma][subscript]spj=1nc[subscript]jk[subscript]x[subscript] j ǁ [subscript]2 is minimized [underline]for only a fixed number \it n. Lastly, numerical results for different types of examples are provided to evaluate this numerical method;Part II. First, a blocked Cholesky factorization algorithm using non-standard level-2 BLAS (Basic Linear Algebra Subprograms) and three blocked Cholesky algorithms using standard level 2 & 3 BLAS are developed on the Hitachi Data Systems (HDS) AS/XL V60, and their performances are compared to the existing unblocked algorithm. The blocked algorithm using non-standard level-2 BLAS performs best of all algorithms considered on HDS computer, but non-standard level-2 BLAS were optimized and performed well on only HDS computer. For this reason, a blocked algorithm using standard BLAS and giving a near optimal performance on all of the HDS AS/EX V60, the IBM 3090E, the Cray 2, X-MP, and Y-MP is found and its performance is compared to the vendor supplied Cholesky routine (when available) on each computer. Since the IBM ESSL vector library does not have an optimized DSYRK, it was optimized for the IBM 3090E before all algorithms were tested;Next, five parallel Cholesky factorization algorithms each of which uses standard BLAS are presented. The parallel performance of these algorithms is measured on each of the Cray-2, Cray X-MP/48, Cray Y-MP/832, and the IBM 3090-600E and J. For the IBM 3090 computers, the parallel performance of these algorithms is also compared with a vendor optimized Cholesky factorization from ESSL
Solution of the Skyrme-Hartree-Fock-Bogolyubov equations in the Cartesian deformed harmonic-oscillator basis. (VII) HFODD (v2.49t): a new version of the program
We describe the new version (v2.49t) of the code HFODD which solves the
nuclear Skyrme Hartree-Fock (HF) or Skyrme Hartree-Fock-Bogolyubov (HFB)
problem by using the Cartesian deformed harmonic-oscillator basis. In the new
version, we have implemented the following physics features: (i) the isospin
mixing and projection, (ii) the finite temperature formalism for the HFB and
HF+BCS methods, (iii) the Lipkin translational energy correction method, (iv)
the calculation of the shell correction. A number of specific numerical methods
have also been implemented in order to deal with large-scale multi-constraint
calculations and hardware limitations: (i) the two-basis method for the HFB
method, (ii) the Augmented Lagrangian Method (ALM) for multi-constraint
calculations, (iii) the linear constraint method based on the approximation of
the RPA matrix for multi-constraint calculations, (iv) an interface with the
axial and parity-conserving Skyrme-HFB code HFBTHO, (v) the mixing of the HF or
HFB matrix elements instead of the HF fields. Special care has been paid to
using the code on massively parallel leadership class computers. For this
purpose, the following features are now available with this version: (i) the
Message Passing Interface (MPI) framework, (ii) scalable input data routines,
(iii) multi-threading via OpenMP pragmas, (iv) parallel diagonalization of the
HFB matrix in the simplex breaking case using the ScaLAPACK library. Finally,
several little significant errors of the previous published version were
corrected.Comment: Accepted for publication to Computer Physics Communications. Program
files re-submitted to Comp. Phys. Comm. Program Library after correction of
several minor bug
High performance with high accuracy laboratory
ln order to obtain high performance with high accuracy in the so lution of scientific computational problems, a computational tool has been developed, called High Performance with High Accuracy Labora tory. ln this paper we describe initially the high performance and then the high accuracy and the interval mathematics. After that , the tool is described, including two environments in which it has been developed, that is, the Cray Supercomputer vector environment and the paral lel environment based on Transputers. The description summarizes the modules, the basic interval library, the high accuracy arithmetic kernel, the interval applied modules, especially the selint.p library. Finally, there are some comments about the performanc
Algorithms in Lattice QCD
The enormous computing resources that large-scale simulations in Lattice QCD
require will continue to test the limits of even the largest supercomputers into
the foreseeable future. The efficiency of such simulations will therefore concern
practitioners of lattice QCD for some time to come.
I begin with an introduction to those aspects of lattice QCD essential to the
remainder of the thesis, and follow with a description of the Wilson fermion
matrix M, an object which is central to my theme.
The principal bottleneck in Lattice QCD simulations is the solution of linear
systems involving M, and this topic is treated in depth. I compare some of the
more popular iterative methods, including Minimal Residual, Corij ugate Gradient
on the Normal Equation, BI-Conjugate Gradient, QMR., BiCGSTAB and
BiCGSTAB2, and then turn to a study of block algorithms, a special class of iterative
solvers for systems with multiple right-hand sides. Included in this study
are two block algorithms which had not previously been applied to lattice QCD.
The next chapters are concerned with a generalised Hybrid Monte Carlo algorithm
(OHM C) for QCD simulations involving dynamical quarks. I focus squarely
on the efficient and robust implementation of GHMC, and describe some tricks
to improve its performance. A limited set of results from HMC simulations at
various parameter values is presented.
A treatment of the non-hermitian Lanczos method and its application to the
eigenvalue problem for M rounds off the theme of large-scale matrix computations
The Reverse Cuthill-McKee Algorithm in Distributed-Memory
Ordering vertices of a graph is key to minimize fill-in and data structure
size in sparse direct solvers, maximize locality in iterative solvers, and
improve performance in graph algorithms. Except for naturally parallelizable
ordering methods such as nested dissection, many important ordering methods
have not been efficiently mapped to distributed-memory architectures. In this
paper, we present the first-ever distributed-memory implementation of the
reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse
matrix. Our parallelization uses a two-dimensional sparse matrix decomposition.
We achieve high performance by decomposing the problem into a small number of
primitives and utilizing optimized implementations of these primitives. Our
implementation shows strong scaling up to 1024 cores for smaller matrices and
up to 4096 cores for larger matrices
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Evaluating linear recursive filters on clusters of workstations
The aim of this paper is to show that the recently developed high performance algorithm for solving linear recurrence systems with constant coefficients together with the new BLAS-based algorithm for narrow-banded triangular Toeplitz matrix-vector multiplication allow to evaluate linear recursive filters efficiently, even on clusters of workstations. The results of experiments performed on a cluster of twelve Linux workstations are also presented. The performance of the algorithm is comparable with the performance of two processors of Cray SV-1 for such kind of recursive problems
- …