Search CORE

730 research outputs found

Adaptive fault-tolerant routing in hypercube multicomputers

Author: Chen Ming-Syan
Shin Kang G.
Publication venue
Publication date
Field of study

A connected hypercube with faulty links and/or nodes is called an injured hypercube. To enable any non-faulty node to communicate with any other non-faulty node in an injured hypercube, the information on component failures has to be made available to non-faulty nodes so as to route messages around the faulty components. A distributed adaptive fault tolerant routing scheme is proposed for an injured hypercube in which each node is required to know only the condition of its own links. Despite its simplicity, this scheme is shown to be capable of routing messages successfully in an injured hypercube as long as the number of faulty components is less than n. Moreover, it is proved that this scheme routes messages via shortest paths with a rather high probabiltiy and the expected length of a resulting path is very close to that of a shortest path. Since the assumption that the number of faulty components is less than n in an n-dimensional hypercube might limit the usefulness of the above scheme, a routing scheme is introduced based on depth-first search which works in the presence of an arbitrary number of faulty components. Due to the insufficient information on faulty components, the paths chosen by the above scheme may not always be the shortest. To guarantee that all messages be routed via shortest paths, it is proposed that every mode be equipped with more information than that on its own links. The effects of this additional information on routing efficiency are analyzed, and the additional information to be kept at each node for the shortest path routing is determined. Several examples and remarks are also given to illustrate the results

NASA Technical Reports Server

Hyperswitch communication network

Author: Peterson J.
Pniel M.
Upchurch E.
Publication venue
Publication date
Field of study

The Hyperswitch Communication Network (HCN) is a large scale parallel computer prototype being developed at JPL. Commercial versions of the HCN computer are planned. The HCN computer being designed is a message passing multiple instruction multiple data (MIMD) computer, and offers many advantages in price-performance ratio, reliability and availability, and manufacturing over traditional uniprocessors and bus based multiprocessors. The design of the HCN operating system is a uniquely flexible environment that combines both parallel processing and distributed processing. This programming paradigm can achieve a balance among the following competing factors: performance in processing and communications, user friendliness, and fault tolerance. The prototype is being designed to accommodate a maximum of 64 state of the art microprocessors. The HCN is classified as a distributed supercomputer. The HCN system is described, and the performance/cost analysis and other competing factors within the system design are reviewed

NASA Technical Reports Server

On hardware for generating routes in Kautz digraphs

Author: Boer F. de
Havinga Paul J.M.
Jansen Pierre G.
Molenkamp Bert
Smit Gerard J.M.
Publication venue: North-Holland
Publication date: 01/01/1991
Field of study

In this paper we present a hardware implementation of an algorithm for generating node disjoint routes in a Kautz network. Kautz networks are based on a family of digraphs described by W.H. Kautz[Kautz 68]. A Kautz network with in-degree and out-degree d has N = dk + dk¿1 nodes (for any cardinals d, k>0). The diameter is at most k, the degree is fixed and independent of the network size. Moreover, it is fault-tolerant, the connectivity is d and the mapping of standard computation graphs such as a linear array, a ring and a tree on a Kautz network is straightforward.\ud The network has a simple routing mechanism, even when nodes or links are faulty. Imase et al. [Imase 86] showed the existence of d node disjoint paths between any pair of vertices. In Smit et al. [Smit 91] an algorithm is described that generates d node disjoint routes between two arbitrary nodes in the network. In this paper we present a simple and fast hardware implementation of this algorithm. It can be realized with standard components (Field Programmable Gate Arrays)

University of Twente Research Information

Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

Author: Baker R.
Frank G.
Gray G.
Scheper C.
Yalamanchili S.
Publication venue
Publication date
Field of study

Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified

NASA Technical Reports Server

Evaluation of fault-tolerant parallel-processor architectures over long space missions

Author: Johnson Sally C.
Publication venue
Publication date
Field of study

The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration

NASA Technical Reports Server