556 research outputs found
System configuration and executive requirements specifications for reusable shuttle and space station/base
System configuration and executive requirements specifications for reusable shuttle and space station/bas
Recommended from our members
Fault tolerance in super-scalar and VLIW processors
In this paper, we present a method for utilizing the spare capacity in super-scalar and very long instruction word (VLIW) processors to tolerate functional unit failures. Unlike previous work that was primarily interested in detection of transient faults, we are concerned with more permanent and/or intermittent faults which necessitate processor reconfiguration. Our method utilizes the VLIW compiler or the superscalar scheduler to insert redundant operations whenever idle functional units exist. The results of these redundant operations are used to detect and diagnose functional unit failures. For super-scalar processors, the scheduler can then utilize this information to ensure that operations are performed only on non-faulty units. In VLIW processors, this is equivalent to recompiling the code to run on the remaining non-faulty functional units. Since in certain applications, recompilation may not be possible, we consider two alternative reconfiguration strategies for VLIW processors. These strategies sacrifice storage space and execution time, respectively, in order to reconfigure without recompiling. We present Markov models that describe the behavior of processors using these different approaches and we evaluate their reliabilities. The results show that, while super-scalar and VLIW with recompilation provide the highest reliability, all proposed strategies significantly increase reliability over that of an unprotected processor
C-MOS array design techniques: SUMC multiprocessor system study
The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units
Design of a fault tolerant airborne digital computer. Volume 1: Architecture
This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive
Techniques for the realization of ultrareliable spaceborne computers Interim scientific report
Error-free ultrareliable spaceborne computer
Advanced software techniques for space shuttle data management systems Final report
Airborne/spaceborn computer design and techniques for space shuttle data management system
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
A fault-tolerant multiprocessor architecture for aircraft, volume 1
A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed
Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer
SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness
Constraint Based System-Level Diagnosis of Multiprocessors
Massively parallel multiprocessors induce new requirements for system-level fault diagnosis, like handling a huge number of processing elements in an inhomogeneous system. Traditional diagnostic models (like PMC, BGM, etc.) are insufficient to fulfill all of these requirements. This paper presents a novel modelling technique, based on a special area of artificial intelligence (AI) methods: constraint satisfaction (CS). The constraint based approach is able to handle functional faults in a similar way to the Russel-Kime model. Moreover, it can use multiple-valued logic to deal with system components having multiple fault modes. The resolution of the produced models can be adjusted to fit the actual diagnostic goal. Consequently, constrint based methods are applicable to a much wider range of multiprocessor architectures than earlier models. The basic problem of system-level diagnosis, syndrome decoding, can be easily transformed into a constraint satisfaction problem (CSP). Thus, the diagnosis algorithm can be derived from the related constraint solving algorithm. Different abstraction leveles can be used for the various diagnosis resolutions, employing the same methodology. As examples, two algorithms are described in the paper; both of them is intended for the Parsytec GCel massively parallel system. The centralized method uses a more elaborate system model, and provides detailed diagnostic information, suitable for off-line evaluation. The distributed method makes fast decisions for reconfiguration control, using a simplified model. Keywords system-level self-diagnosis, massively parallel computing systems, constraint satisfaction, diagnostic models, centralized and distributed diagnostic algorithms
- …