88,709 research outputs found

    A performance analysis of the PASLIB version 2.1X SEND and RECV routines on the finite element machine

    Get PDF
    The Finite Element Machine is an experimental array processor designed to support research in parallel algorithms and architectures. This report presents a case study of communications using the SENDa and RECV system software routines on the Finite Element Machine, followed by a discussion of the effect of I/O performance on the efficiency of parallel algorithms

    Destination-directed, packet-switched architecture for a geostationary communications satellite network

    Get PDF
    A major goal of the Digital Systems Technology Branch at the NASA Lewis Research Center is to identify and develop critical digital components and technologies that either enable new commercial missions or significantly enhance the performance, cost efficiency, and/or reliability of existing and planned space communications systems. NASA envisions a need for low-data-rate, interactive, direct-to-the-user communications services for data, voice, facsimile, and video conferencing. The network would provide enhanced very-small-aperture terminal (VSAT) communications services and be capable of handling data rates of 64 kbps through 2.048 Mbps in 64-kbps increments. Efforts have concentrated heavily on the space segment; however, the ground segment has been considered concurrently to ensure cost efficiency and realistic operational constraints. The focus of current space segment developments is a flexible, high-throughput, fault-tolerant onboard information-switching processor (ISP) for a geostationary satellite communications network. The Digital Systems Technology Branch is investigating both circuit and packet architectures for the ISP. Destination-directed, packet-switched architectures for geostationary communications satellites are addressed

    Billion-atom Synchronous Parallel Kinetic Monte Carlo Simulations of Critical 3D Ising Systems

    Full text link
    An extension of the synchronous parallel kinetic Monte Carlo (pkMC) algorithm developed by Martinez {\it et al} [{\it J.\ Comp.\ Phys.} {\bf 227} (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors time clocks current in a global sense. Boundary conflicts are rigorously solved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of the serial method, which confirms the statistical validity of the method. We have assessed the parallel efficiency of the method and find that our algorithm scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations

    Inner product computation for sparse iterative solvers on\ud distributed supercomputer

    Get PDF
    Recent years have witnessed that iterative Krylov methods without re-designing are not suitable for distribute supercomputers because of intensive global communications. It is well accepted that re-engineering Krylov methods for prescribed computer architecture is necessary and important to achieve higher performance and scalability. The paper focuses on simple and practical ways to re-organize Krylov methods and improve their performance for current heterogeneous distributed supercomputers. In construct with most of current software development of Krylov methods which usually focuses on efficient matrix vector multiplications, the paper focuses on the way to compute inner products on supercomputers and explains why inner product computation on current heterogeneous distributed supercomputers is crucial for scalable Krylov methods. Communication complexity analysis shows that how the inner product computation can be the bottleneck of performance of (inner) product-type iterative solvers on distributed supercomputers due to global communications. Principles of reducing such global communications are discussed. The importance of minimizing communications is demonstrated by experiments using up to 900 processors. The experiments were carried on a Dawning 5000A, one of the fastest and earliest heterogeneous supercomputers in the world. Both the analysis and experiments indicates that inner product computation is very likely to be the most challenging kernel for inner product-based iterative solvers to achieve exascale

    A Many-Core Overlay for High-Performance Embedded Computing on FPGAs

    Get PDF
    In this work, we propose a configurable many-core overlay for high-performance embedded computing. The size of internal memory, supported operations and number of ports can be configured independently for each core of the overlay. The overlay was evaluated with matrix multiplication, LU decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform. The results show that using a system-level many-core overlay avoids complex hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423
    • …
    corecore