300 research outputs found

    Multi-partitioning for ADI-schemes on message passing architectures

    Get PDF
    A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation

    Algebraic, Block and Multiplicative Preconditioners based on Fast Tridiagonal Solves on GPUs

    Get PDF
    This thesis contributes to the field of sparse linear algebra, graph applications, and preconditioners for Krylov iterative solvers of sparse linear equation systems, by providing a (block) tridiagonal solver library, a generalized sparse matrix-vector implementation, a linear forest extraction, and a multiplicative preconditioner based on tridiagonal solves. The tridiagonal library, which supports (scaled) partial pivoting, outperforms cuSPARSE's tridiagonal solver by factor five while completely utilizing the available GPU memory bandwidth. For the performance optimized solving of multiple right-hand sides, the explicit factorization of the tridiagonal matrix can be computed. The extraction of a weighted linear forest (union of disjoint paths) from a general graph is used to build algebraic (block) tridiagonal preconditioners and deploys the generalized sparse-matrix vector implementation of this thesis for preconditioner construction. During linear forest extraction, a new parallel bidirectional scan pattern, which can operate on double-linked list structures, identifies the path ID and the position of a vertex. The algebraic preconditioner construction is also used to build more advanced preconditioners, which contain multiple tridiagonal factors, based on generalized ILU factorizations. Additionally, other preconditioners based on tridiagonal factors are presented and evaluated in comparison to ILU and ILU incomplete sparse approximate inverse preconditioners (ILU-ISAI) for the solution of large sparse linear equation systems from the Sparse Matrix Collection. For all presented problems of this thesis, an efficient parallel algorithm and its CUDA implementation for single GPU systems is provided

    A conservative overlap method for multi-block parallelization of compact finite-volume schemes

    Get PDF
    A conservative approach for MPI-based parallelization of tridiagonal compact schemes is developed in the context of multi-block finite-volume methods. For each block, an enlarged linear system is solved by overlapping a certain number of neighbour cells from adjacent sub-domains. The values at block-to-block boundary faces are evaluated by a high-order centered approximation formula. Unlike previous methods, conservation is retained by properly re-computing the common interface value between two neighbouring blocks. Numerical tests show that parallelization artifacts decrease significantly as the number of overlapping cells is increased, at some expense of parallel efficiency. A reasonable trade-off between accuracy and performances is discussed in the paper with reference to both the spectral properties of the method and the results of fully turbulent numerical simulations.Peer ReviewedPostprint (published version

    HPCCP/CAS Workshop Proceedings 1998

    Get PDF
    This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

    The NAS parallel benchmarks

    Get PDF
    A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided

    An investigation of the airflow in mushroom growing structures, the development of an improved, three-dimensional solution technique for fluid flow and its evaluation for the modelling of mushroom growing structures

    Get PDF
    This thesis is an examination of the airflows in mushroom growing rooms. An experimental investigation of the nature of the flows in Irish tunnels showed them to be of low magnitude at the crop but controllable in principle for single layer growing. It was found that stratification of the airflow in growing tunnels could cause severe reductions in cropping surface airspeed and the operation of the heating system was identified as the main source of this. An alternative air distribution system was shown to have the potential to overcome the effects of heating. Airflow for three level growing systems in tunnels was found to be non-uniform and the use of wall-mounted deflecting plates was shown to have the potential to correct this. The provision of air flow solutions for the wide range of new growing systems would be difficult using empirical methods alone and therefore a modelling approach was sought to complement and aid the experimental work. The initial modelling work was carried out in two dimensions with TEACH-T code (SIMPLE flow solver) to calculate the turbulent flow. The code was extended to three dimensions because it was not possible to model usefully in a two-dimensional approximation. Convergence times for the SIMPLE solver were found to be excessively long. Trial applications of multi-level acceleration produced approximately 15% savings in computational effort so a new solver was investigated. The CELS (Coupled Equation Line Solver) method had been reported as superior to SIMPLE in two dimensions and already has a multi-level technique to accelerate convergence, i.e. Additive Correction Multigrid (ACM). CELS was first applied in two dimensions in order to test its usefulness with the turbulence model in the equation set. Improvements in the time to convergence, relative to SIMPLE, justified its extension to three dimensions. The Additive Correction Multi grid technique also produced significant improvements and this was extended to three dimensions. CELS3D is essentially a plane solver applied to a three-dimensional grid and a number of procedures for its application were investigated. All produced savings relative to the SIMPLE solver. The QUICK differencing scheme was incorporated in the TEACH-based code and CELS3D was tested with various geometries and values of the Reynolds number. The best results gave a 79% reduction in the time to convergence of the solver. The ACM technique in three dimensions was investigated but no useful savings in computational effort were made. In the application to mushroom growing structures, the principles of the application of CELS3D to flows around obstructions in the flow domain were examined and the difficulties identified. A solution was found but its implementation proved impractical for all but the simplest cases

    The Twenty-First NASTRAN (R) Users' Colloquium

    Get PDF
    This publication contains the proceedings of the Twenty-First NASTRAN Users' Colloquium held in Tampa, FL, April 26 through April 30, 1993. It provides some comprehensive general papers on the application of finite elements in engineering, comparisons with other approaches, unique applications, pre-and postprocessing with other auxiliary programs and new methods of analysis with NASTRAN

    Gate-Level Masking of Streamlined NTRU Prime Decapsulation in Hardware

    Get PDF
    Streamlined NTRU Prime is a lattice-based Key Encapsulation Mechanism (KEM) that is, together with X25519, currently the default algorithm in OpenSSH 9. Being based on lattice assumptions, it is assumed to be secure also against attackers with access to large-scale quantum computers. While Post-Quantum Cryptography (PQC) schemes have been subject to extensive research in the recent years, challenges remain with respect to protection mechanisms against attackers that have additional side-channel information such as the power consumption of a device processing secret data. As a countermeasure to such attacks, masking has been shown to be a promising and effective approach. For public-key schemes, including any recent PQC schemes, usually a mixture of Boolean and arithmetic approaches are applied on an algorithmic level. Our generic hardware implementation of Streamlined NTRU Prime decapsulation, however, follows an idea that until now was assumed to be only applicable to symmetric cryptography: gate-level masking. There, a hardware design that consists of logic gates is transformed into a secure implementation by replacing each gate with a composably secure gadget that operates on uniform random shares of secret values. In our work, we show the feasibility of applying this approach also to PQC schemes and present the first Public-Key Cryptography (PKC) – pre- and post-quantum – implementation masked at gate level considering several trade-offs and design choices. We synthesize our implementation both for Artix-7 Field-Programmable Gate Arrays (FPGAs) and 45 nm Application-Specific Integrated Circuits (ASICs), yielding practically feasible results regarding area, randomness demand and latency. Finally, we also analyze the applicability of our concept to Kyber which will be standardized by the National Institute of Standards and Technology (NIST)

    Summary of research in progress at ICASE

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period October 1, 1992 through March 31, 1993
    corecore