10,931 research outputs found

    Evaluation of fault-tolerant parallel-processor architectures over long space missions

    Get PDF
    The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration

    Reconfiguration for Fault Tolerance and Performance Analysis

    Get PDF
    Architecture reconfiguration, the ability of a system to alter the active interconnection among modules, has a history of different purposes and strategies. Its purposes develop from the relatively simple desire to formalize procedures that all processes have in common to reconfiguration for the improvement of fault-tolerance, to reconfiguration for performance enhancement, either through the simple maximizing of system use or by sophisticated notions of wedding topology to the specific needs of a given process. Strategies range from straightforward redundancy by means of an identical backup system to intricate structures employing multistage interconnection networks. The present discussion surveys the more important contributions to developments in reconfigurable architecture. The strategy here is in a sense to approach the field from an historical perspective, with the goal of developing a more coherent theory of reconfiguration. First, the Turing and von Neumann machines are discussed from the perspective of system reconfiguration, and it is seen that this early important theoretical work contains little that anticipates reconfiguration. Then some early developments in reconfiguration are analyzed, including the work of Estrin and associates on the fixed plus variable restructurable computer system, the attempt to theorize about configurable computers by Miller and Cocke, and the work of Reddi and Feustel on their restructable computer system. The discussion then focuses on the most sustained systems for fault tolerance and performance enhancement that have been proposed. An attempt will be made to define fault tolerance and to investigate some of the strategies used to achieve it. By investigating four different systems, the Tandern computer, the C.vmp system, the Extra Stage Cube, and the Gamma network, the move from dynamic redundancy to reconfiguration is observed. Then reconfiguration for performance enhancement is discussed. A survey of some proposals is attempted, then the discussion focuses on the most sustained systems that have been proposed: PASM, the DC architecture, the Star local network, and the NYU Ultracomputer. The discussion is organized around a comparison of control, scheduling, communication, and network topology. Finally, comparisons are drawn between fault tolerance and performance enhancement, in order to clarify the notion of reconfiguration and to reveal the common ground of fault tolerance and performance enhancement as well as the areas in which they diverge. An attempt is made in the conclusion to derive from this survey and analysis some observations on the nature of reconfiguration, as well as some remarks on necessary further areas of research

    Fault tolerant clos network

    Get PDF
    Multistage interconnection networks, or MINs, provide paths between functional modules in multiprocessor systems. The MINs are usually segmented into several stages. Each stage connects inputs to appropriate links of the next stage so that the cumulative effect of all the stages satisfies input-output connection requirements. This thesis deals with a fault tolerant Clos network. The fault tolerance technique involves addition of extra switches per stage to compensate for any switch failure The reliability analysis of both ordinary and fault tolerant Clos networks is presented. The optimal number of extra switches required to get the best reliability results has been analyzed

    Communication and control in an integrated manufacturing system

    Get PDF
    Typically, components in a manufacturing system are all centrally controlled. Due to possible communication bottlenecking, unreliability, and inflexibility caused by using a centralized controller, a new concept of system integration called an Integrated Multi-Robot System (IMRS) was developed. The IMRS can be viewed as a distributed real time system. Some of the current research issues being examined to extend the framework of the IMRS to meet its performance goals are presented. These issues include the use of communication coprocessors to enhance performance, the distribution of tasks and the methods of providing fault tolerance in the IMRS. An application example of real time collision detection, as it relates to the IMRS concept, is also presented and discussed

    Fault tolerant architectures for integrated aircraft electronics systems, task 2

    Get PDF
    The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported

    Power analysis with variable traffic loads for next generation interconnection networks

    Get PDF
    Power consumption is the most important factor for the consideration of next generation supercomputers. In addition, the requirement of power usages can be even scaled up to more than 300MW (which is nearly equal to the one nuclear power plant) with the conventional networks. On the other hand, hierarchical interconnection networks can be a possible solution to those issues. 3D-TTN is a hierarchical interconnection network where lowest level is configured as the 3Dtorus network, following the 2Dtorus network at the higher-level networks. The main focus for this paper is the power analysis with variable traffic load along with the fault tolerance, cost, packing density and message traffic density of 3D-TTN comparing against various other networks. In our early research, 3D-TTN has achieved near about 21% better diameter performance, 12% better average distance performance and eventually required about 32.48% less router power usage at the lowest level than the 5Dtorus network for 1% traffic load. This paper shows the power comparison with the router and link power rather than considering the router power only. Our analysis shows that 3DTTN will require about 39.96% less router and link power than the 5Dtorus network for 10% traffic. With 30% traffic load, 3DTTN will require about 38.42% less power than the 5Dtorus network for the on-chip network. Even considering some topological parameters, 3D-TTN could also achieve some desirable performance by comparing other networks

    Fault-tolerant interconnection networks for multiprocessor systems

    Get PDF
    Interconnection networks represent the backbone of multiprocessor systems. A failure in the network, therefore, could seriously degrade the system performance. For this reason, fault tolerance has been regarded as a major consideration in interconnection network design. This thesis presents two novel techniques to provide fault tolerance capabilities to three major networks: the Baseline network, the Benes network and the Clos network. First, the Simple Fault Tolerance Technique (SFT) is presented. The SFT technique is in fact the result of merging two widely known interconnection mechanisms: a normal interconnection network and a shared bus. This technique is most suitable for networks with small switches, such as the Baseline network and the Benes network. For the Clos network, whose switches may be large for the SFT, another technique is developed to produce the Fault-Tolerant Clos (FTC) network. In the FTC, one switch is added to each stage. The two techniques are described and thoroughly analyzed

    Fault-tolerant onboard digital information switching and routing for communications satellites

    Get PDF
    The NASA Lewis Research Center is developing an information-switching processor for future meshed very-small-aperture terminal (VSAT) communications satellites. The information-switching processor will switch and route baseband user data onboard the VSAT satellite to connect thousands of Earth terminals. Fault tolerance is a critical issue in developing information-switching processor circuitry that will provide and maintain reliable communications services. In parallel with the conceptual development of the meshed VSAT satellite network architecture, NASA designed and built a simple test bed for developing and demonstrating baseband switch architectures and fault-tolerance techniques. The meshed VSAT architecture and the switching demonstration test bed are described, and the initial switching architecture and the fault-tolerance techniques that were developed and tested are discussed

    Designing application software in wide area network settings

    Get PDF
    Progress in methodologies for developing robust local area network software has not been matched by similar results for wide area settings. The design of application software spanning multiple local area environments is examined. For important classes of applications, simple design techniques are presented that yield fault tolerant wide area programs. An implementation of these techniques as a set of tools for use within the ISIS system is described
    corecore