Search CORE

88,709 research outputs found

A performance analysis of the PASLIB version 2.1X SEND and RECV routines on the finite element machine

Author: Knott J. D.
Publication venue
Publication date
Field of study

The Finite Element Machine is an experimental array processor designed to support research in parallel algorithms and architectures. This report presents a case study of communications using the SENDa and RECV system software routines on the Finite Element Machine, followed by a discussion of the effect of I/O performance on the efficiency of parallel algorithms

NASA Technical Reports Server

Destination-directed, packet-switched architecture for a geostationary communications satellite network

Author: Bobinsky Eric A.
Ivancic William D.
Kim Heechul
Quintana Jorge A.
Shalkhauser Mary JO
Soni Nitin J.
Vanderaar Mark
Wager Paul
Publication venue
Publication date
Field of study

A major goal of the Digital Systems Technology Branch at the NASA Lewis Research Center is to identify and develop critical digital components and technologies that either enable new commercial missions or significantly enhance the performance, cost efficiency, and/or reliability of existing and planned space communications systems. NASA envisions a need for low-data-rate, interactive, direct-to-the-user communications services for data, voice, facsimile, and video conferencing. The network would provide enhanced very-small-aperture terminal (VSAT) communications services and be capable of handling data rates of 64 kbps through 2.048 Mbps in 64-kbps increments. Efforts have concentrated heavily on the space segment; however, the ground segment has been considered concurrently to ensure cost efficiency and realistic operational constraints. The focus of current space segment developments is a flexible, high-throughput, fault-tolerant onboard information-switching processor (ISP) for a geostationary satellite communications network. The Digital Systems Technology Branch is investigating both circuit and packet architectures for the ISP. Destination-directed, packet-switched architectures for geostationary communications satellites are addressed

NASA Technical Reports Server

Recommended from our members

Computer-aided programming for multiprocessing systems

Author: Gajski Daniel D.
Wu Min-You
Publication venue: eScholarship, University of California
Publication date: 30/06/1988
Field of study

As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. This report discusses parallel models of computation and tools for computer-aided programming (CAP). Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. In particular, a CAP tool, named Hypertool, is described here. It performs scheduling and handles the communication primitive insertion automatically so that many errors are eliminated. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs. Experiments have shown that up to a 300% performance improvement can be achieved by computer-aided programming

eScholarship - University of California

Billion-atom Synchronous Parallel Kinetic Monte Carlo Simulations of Critical 3D Ising Systems

Author: Appel
Baillie
Berche
Bortz
Chatterjee
Chatterjee
Cox
Culler
E. Martínez
Eick
Fichthorn
Glauber
Grassberger
Hanusse
Hasenbusch
Heermann
Ivaneyko
J. Marian
Kalos
Kara
Lubachevsky
Martinez
Matz
Merrick
Nandipati
Newman
Novotny
Oliveira
Oppelstrup
P.R. Monasterio
Parisi
Shim
Shim
Snyder
Stauffer
Voter
Zheng
Publication venue: 'Elsevier BV'
Publication date: 25/05/2010
Field of study

An extension of the synchronous parallel kinetic Monte Carlo (pkMC) algorithm developed by Martinez {\it et al} [{\it J.\ Comp.\ Phys.} {\bf 227} (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors time clocks current in a global sense. Boundary conflicts are rigorously solved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of the serial method, which confirms the statistical validity of the method. We have assessed the parallel efficiency of the method and find that our algorithm scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations

arXiv.org e-Print Archive

Crossref

UNT Digital Library

Inner product computation for sparse iterative solvers on\ud distributed supercomputer

Author: Gu T. -X.
Liu X. -P.
Zhu S. -X.
Publication venue
Publication date: 01/01/2012
Field of study

Recent years have witnessed that iterative Krylov methods without re-designing are not suitable for distribute supercomputers because of intensive global communications. It is well accepted that re-engineering Krylov methods for prescribed computer architecture is necessary and important to achieve higher performance and scalability. The paper focuses on simple and practical ways to re-organize Krylov methods and improve their performance for current heterogeneous distributed supercomputers. In construct with most of current software development of Krylov methods which usually focuses on efficient matrix vector multiplications, the paper focuses on the way to compute inner products on supercomputers and explains why inner product computation on current heterogeneous distributed supercomputers is crucial for scalable Krylov methods. Communication complexity analysis shows that how the inner product computation can be the bottleneck of performance of (inner) product-type iterative solvers on distributed supercomputers due to global communications. Principles of reducing such global communications are discussed. The importance of minimizing communications is demonstrated by experiments using up to 900 processors. The experiments were carried on a Dawning 5000A, one of the fastest and earliest heterogeneous supercomputers in the world. Both the analysis and experiments indicates that inner product computation is very likely to be the most challenging kernel for inner product-based iterative solvers to achieve exascale

Oxford University Research Archive

A Many-Core Overlay for High-Performance Embedded Computing on FPGAs

Author: Neto Horácio
Véstias Mário
Publication venue
Publication date: 21/08/2014
Field of study

In this work, we propose a configurable many-core overlay for high-performance embedded computing. The size of internal memory, supported operations and number of ports can be configured independently for each core of the overlay. The overlay was evaluated with matrix multiplication, LU decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform. The results show that using a system-level many-core overlay avoids complex hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423

arXiv.org e-Print Archive

Repositório Científico do Instituto Politécnico de Lisboa