266 research outputs found

    Numerical solution for the minimum norm solution to the first kind integral equation with a special kernel and efficient implementations of the Cholesky factorization algorithm on the vector and parallel supercomputers

    Get PDF
    Part I. Let K: L[subscript]2[a,b] → L[subscript]2[c,d] be a bounded linear operator defined by (Kf)(x)=ϵt[subscript]spabk(x,y)f(y)dy, where kϵ L[subscript]2([c,d]x[a,b]) and fϵ L[subscript]2[a,b]. Define k[subscript]x by k[subscript]x(y) = k(x,y). Assume K has the property that (a) k[subscript]xϵ L[subscript]2[a,b] for all xϵ[c,d] and (b) Kf = 0 a.e. implies (Kf)(x) = 0 for all xϵ[c,d]. Then, it is shown that the minimum norm solution f[subscript]0 to the first kind of Fredholm integral equation Kf = g is the L[subscript]2-norm limit of linear combinations of the k[subscript]x\u27s. Next, it is shown how to choose constants c[subscript]1, c[subscript]2, ·s, c[subscript]n to minimize ǁ f[subscript]0-[sigma][subscript]spj=1nc[subscript]jk[subscript]x[subscript] j ǁ [subscript]2 for n fixed points x[subscript]1, x[subscript]2, ·s, x[subscript]n [underline]without knowing what f[subscript]0 is. Perturbation results and some characteristics of this approximate solution f[subscript]n = [sigma][subscript]spj=1nc[subscript]jk[subscript]x[subscript] j for f[subscript]0 are presented;This paper also contains a numerical method choosing n points x[subscript]1, x[subscript]2, ·s, x[subscript]n at which ǁ f[subscript]0- [sigma][subscript]spj=1nc[subscript]jk[subscript]x[subscript] j ǁ [subscript]2 is minimized [underline]for only a fixed number \it n. Lastly, numerical results for different types of examples are provided to evaluate this numerical method;Part II. First, a blocked Cholesky factorization algorithm using non-standard level-2 BLAS (Basic Linear Algebra Subprograms) and three blocked Cholesky algorithms using standard level 2 & 3 BLAS are developed on the Hitachi Data Systems (HDS) AS/XL V60, and their performances are compared to the existing unblocked algorithm. The blocked algorithm using non-standard level-2 BLAS performs best of all algorithms considered on HDS computer, but non-standard level-2 BLAS were optimized and performed well on only HDS computer. For this reason, a blocked algorithm using standard BLAS and giving a near optimal performance on all of the HDS AS/EX V60, the IBM 3090E, the Cray 2, X-MP, and Y-MP is found and its performance is compared to the vendor supplied Cholesky routine (when available) on each computer. Since the IBM ESSL vector library does not have an optimized DSYRK, it was optimized for the IBM 3090E before all algorithms were tested;Next, five parallel Cholesky factorization algorithms each of which uses standard BLAS are presented. The parallel performance of these algorithms is measured on each of the Cray-2, Cray X-MP/48, Cray Y-MP/832, and the IBM 3090-600E and J. For the IBM 3090 computers, the parallel performance of these algorithms is also compared with a vendor optimized Cholesky factorization from ESSL

    Solution of the Skyrme-Hartree-Fock-Bogolyubov equations in the Cartesian deformed harmonic-oscillator basis. (VII) HFODD (v2.49t): a new version of the program

    Full text link
    We describe the new version (v2.49t) of the code HFODD which solves the nuclear Skyrme Hartree-Fock (HF) or Skyrme Hartree-Fock-Bogolyubov (HFB) problem by using the Cartesian deformed harmonic-oscillator basis. In the new version, we have implemented the following physics features: (i) the isospin mixing and projection, (ii) the finite temperature formalism for the HFB and HF+BCS methods, (iii) the Lipkin translational energy correction method, (iv) the calculation of the shell correction. A number of specific numerical methods have also been implemented in order to deal with large-scale multi-constraint calculations and hardware limitations: (i) the two-basis method for the HFB method, (ii) the Augmented Lagrangian Method (ALM) for multi-constraint calculations, (iii) the linear constraint method based on the approximation of the RPA matrix for multi-constraint calculations, (iv) an interface with the axial and parity-conserving Skyrme-HFB code HFBTHO, (v) the mixing of the HF or HFB matrix elements instead of the HF fields. Special care has been paid to using the code on massively parallel leadership class computers. For this purpose, the following features are now available with this version: (i) the Message Passing Interface (MPI) framework, (ii) scalable input data routines, (iii) multi-threading via OpenMP pragmas, (iv) parallel diagonalization of the HFB matrix in the simplex breaking case using the ScaLAPACK library. Finally, several little significant errors of the previous published version were corrected.Comment: Accepted for publication to Computer Physics Communications. Program files re-submitted to Comp. Phys. Comm. Program Library after correction of several minor bug

    High performance with high accuracy laboratory

    Get PDF
    ln order to obtain high performance with high accuracy in the so lution of scientific computational problems, a computational tool has been developed, called High Performance with High Accuracy Labora tory. ln this paper we describe initially the high performance and then the high accuracy and the interval mathematics. After that , the tool is described, including two environments in which it has been developed, that is, the Cray Supercomputer vector environment and the paral lel environment based on Transputers. The description summarizes the modules, the basic interval library, the high accuracy arithmetic kernel, the interval applied modules, especially the selint.p library. Finally, there are some comments about the performanc

    Algorithms in Lattice QCD

    Get PDF
    The enormous computing resources that large-scale simulations in Lattice QCD require will continue to test the limits of even the largest supercomputers into the foreseeable future. The efficiency of such simulations will therefore concern practitioners of lattice QCD for some time to come. I begin with an introduction to those aspects of lattice QCD essential to the remainder of the thesis, and follow with a description of the Wilson fermion matrix M, an object which is central to my theme. The principal bottleneck in Lattice QCD simulations is the solution of linear systems involving M, and this topic is treated in depth. I compare some of the more popular iterative methods, including Minimal Residual, Corij ugate Gradient on the Normal Equation, BI-Conjugate Gradient, QMR., BiCGSTAB and BiCGSTAB2, and then turn to a study of block algorithms, a special class of iterative solvers for systems with multiple right-hand sides. Included in this study are two block algorithms which had not previously been applied to lattice QCD. The next chapters are concerned with a generalised Hybrid Monte Carlo algorithm (OHM C) for QCD simulations involving dynamical quarks. I focus squarely on the efficient and robust implementation of GHMC, and describe some tricks to improve its performance. A limited set of results from HMC simulations at various parameter values is presented. A treatment of the non-hermitian Lanczos method and its application to the eigenvalue problem for M rounds off the theme of large-scale matrix computations

    The Reverse Cuthill-McKee Algorithm in Distributed-Memory

    Full text link
    Ordering vertices of a graph is key to minimize fill-in and data structure size in sparse direct solvers, maximize locality in iterative solvers, and improve performance in graph algorithms. Except for naturally parallelizable ordering methods such as nested dissection, many important ordering methods have not been efficiently mapped to distributed-memory architectures. In this paper, we present the first-ever distributed-memory implementation of the reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse matrix. Our parallelization uses a two-dimensional sparse matrix decomposition. We achieve high performance by decomposing the problem into a small number of primitives and utilizing optimized implementations of these primitives. Our implementation shows strong scaling up to 1024 cores for smaller matrices and up to 4096 cores for larger matrices

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Evaluating linear recursive filters on clusters of workstations

    Get PDF
    The aim of this paper is to show that the recently developed high performance algorithm for solving linear recurrence systems with constant coefficients together with the new BLAS-based algorithm for narrow-banded triangular Toeplitz matrix-vector multiplication allow to evaluate linear recursive filters efficiently, even on clusters of workstations. The results of experiments performed on a cluster of twelve Linux workstations are also presented. The performance of the algorithm is comparable with the performance of two processors of Cray SV-1 for such kind of recursive problems
    corecore