Search CORE

45,208 research outputs found

CCL: a portable and tunable collective communication library for scalable parallel computers

Author: Alex Ho
Ching-tien Ho
Jehoshua Bruck
Marc Snir
Pablo Elustondo
Robert Cypher
Senior Member
Senior Member
Shlomo Kipnis
Vasanth Bala
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model

CiteSeerX

Caltech Authors

Components and Interfaces of a Process Management System for Parallel Programs

Author: Butler Ralph
Gropp William
Lusk Ewing
Publication venue
Publication date: 01/01/2001
Field of study

Parallel jobs are different from sequential jobs and require a different type of process management. We present here a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising thousands of processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitively. Our primary target is parallel machines made up of clusters of SMPs, but the system is also useful in more tightly integrated environments. We describe how MPD enables much faster startup and better runtime management of parallel jobs. We show how close control of stdio can support the easy implementation of a number of convenient system utilities, even a parallel debugger. We describe a simple but general interface that can be used to separate any process manager from a parallel library, which we use to keep MPD separate from MPICH.Comment: 12 pages, Workshop on Clusters and Computational Grids for Scientific Computing, Sept. 24-27, 2000, Le Chateau de Faverges de la Tour, Franc

arXiv.org e-Print Archive

CiteSeerX

UNT Digital Library

Meeting Real-Time Constraint of Spectrum Management in TV Black-Space Access

Author: Zhao Zhongyuan
Publication venue
Publication date: 01/12/2017
Field of study

The TV set feedback feature standardized in the next generation TV system, ATSC 3.0, would enable opportunistic access of active TV channels in future Cognitive Radio Networks. This new dynamic spectrum access approach is named as black-space access, as it is complementary of current TV white space, which stands for inactive TV channels. TV black-space access can significantly increase the available spectrum of Cognitive Radio Networks in populated urban markets, where spectrum shortage is most severe while TV whitespace is very limited. However, to enable TV black-space access, secondary user has to evacuate a TV channel in a timely manner when TV user comes in. Such strict real-time constraint is an unique challenge of spectrum management infrastructure of Cognitive Radio Networks. In this paper, the real-time performance of spectrum management with regard to the degree of centralization of infrastructure is modeled and tested. Based on collected empirical network latency and database response time, we analyze the average evacuation time under four structures of spectrum management infrastructure: fully distribution, city-wide centralization, national-wide centralization, and semi-national centralization. The results show that national wide centralization may not meet the real-time requirement, while semi-national centralization that use multiple co-located independent spectrum manager can achieve real-time performance while keep most of the operational advantage of fully centralized structure.Comment: 9 pages, 7 figures, Technical Repor

arXiv.org e-Print Archive

DigitalCommons@University of Nebraska

High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

Author: Akoglu Ali
Benkrid Khaled
Ling Cheng
Liu Ying
Song Yang
Tian Xiang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

Feasibility study of an Integrated Program for Aerospace vehicle Design (IPAD). Volume 4: IPAD system design

Author: Anderson L. O.
Carpenter L. C.
Goldfarb W.
Hansen S. D.
Kawaguchi A. S.
Redhed D. D.
Publication venue
Publication date
Field of study

The computing system design of IPAD is described and the requirements which form the basis for the system design are discussed. The system is presented in terms of a functional design description and technical design specifications. The functional design specifications give the detailed description of the system design using top-down structured programming methodology. Human behavioral characteristics, which specify the system design at the user interface, security considerations, and standards for system design, implementation, and maintenance are also part of the technical design specifications. Detailed specifications of the two most common computing system types in use by the major aerospace companies which could support the IPAD system design are presented. The report of a study to investigate migration of IPAD software between the two candidate 3rd generation host computing systems and from these systems to a 4th generation system is included

NASA Technical Reports Server