797 research outputs found
Recommended from our members
Reliability and fault tolerance modelling of multiprocessor systems
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Reliability evaluation by analytic modelling constitute an important issue of designing a reliable multiprocessor system. In this thesis, a model for reliability and fault tolerance analysis of the interconnection network is presented, based on graph theory. Reliability and fault tolerance are considered as deterministic and probabilistic measures of connectivity.
Exact techniques for reliability evaluation fail for large multiprocessor systems because of the enormous computational resources required. Therefore, approximation techniques have to be used. Three approaches are proposed, the first by simplifying the symbolic expression of reliability; the
other two by applying a hierarchical decomposition to the system. All these
methods give results close to those obtained by exact techniques.Consejo Nacional de Ciencia y Tecnologia" (National Council for Science and Technology of Mexico) and "Instituto de Investigaciones Electricas" (Institute for Electrical Research
Analysis and design of algorithm-based fault-tolerant systems
An important consideration in the design of high performance multiprocessor systems is to ensure the correctness of the results computed in the presence of transient and intermittent failures. Concurrent error detection and correction have been applied to such systems in order to achieve reliability. Algorithm Based Fault Tolerance (ABFT) was suggested as a cost-effective concurrent error detection scheme. The research was motivated by the complexity involved in the analysis and design of ABFT systems. To that end, a matrix-based model was developed and, based on that, algorithms for both the design and analysis of ABFT systems are formulated. These algorithms are less complex than the existing ones. In order to reduce the complexity further, a hierarchical approach is developed for the analysis of large systems
A fault-tolerant multiprocessor architecture for aircraft, volume 1
A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed
Experimental analysis of computer system dependability
This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance
Fault-tolerant computer study
A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed
A formalism for describing and simulating systems with interacting components.
This thesis addresses the problem of descriptive complexity presented by systems involving a high number of interacting components. It investigates the evaluation measure of performability and its application to such systems. A new description and simulation language, ICE and it's application to performability modelling is presented. ICE (Interacting ComponEnts) is based upon an earlier description language which was first proposed for defining reliability problems. ICE is declarative in style and has a limited number of keywords. The ethos in the development of the language has been to provide an intuitive formalism with a powerful descriptive space. The full syntax of the language is presented with discussion as to its philosophy. The implementation of a discrete event simulator using an ICE interface is described, with use being made of examples to illustrate the functionality of the code and the semantics of the language. Random numbers are used to provide the required stochastic behaviour within the simulator. The behaviour of an industry standard generator within the simulator and different methods of number allocation are shown. A new generator is proposed that is a development of a fast hardware shift register generator and is demonstrated to possess good statistical properties and operational speed. For the purpose of providing a rigorous description of the language and clarification of its semantics, a computational model is developed using the formalism of extended coloured Petri nets. This model also gives an indication of the language's descriptive power relative to that of a recognised and well developed technique. Some recognised temporal and structural problems of system event modelling are identified. and ICE solutions given. The growing research area of ATM communication networks is introduced and a sophisticated top down model of an ATM switch presented. This model is simulated and interesting results are given. A generic ICE framework for performability modelling is developed and demonstrated. This is considered as a positive contribution to the general field of performability research
Scheduling Techniques for Operating Systems for Medical and IoT Devices: A Review
Software and Hardware synthesis are the major subtasks in the implementation of hardware/software systems. Increasing trend is to build SoCs/NoC/Embedded System for Implantable Medical Devices (IMD) and Internet of Things (IoT) devices, which includes multiple Microprocessors and Signal Processors, allowing designing complex hardware and software systems, yet flexible with respect to the delivered performance and executed application. An important technique, which affect the macroscopic system implementation characteristics is the scheduling of hardware operations, program instructions and software processes. This paper presents a survey of the various scheduling strategies in process scheduling. Process Scheduling has to take into account the real-time constraints. Processes are characterized by their timing constraints, periodicity, precedence and data dependency, pre-emptivity, priority etc. The affect of these characteristics on scheduling decisions has been described in this paper
Cost and benefits design optimization model for fault tolerant flight control systems
Requirements and specifications for a method of optimizing the design of fault-tolerant flight control systems are provided. Algorithms that could be used for developing new and modifying existing computer programs are also provided, with recommendations for follow-on work
Study of fault-tolerant software technology
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
Recommended from our members
Runtime asynchronous fault tolerance via speculation
Transient faults are emerging as a critical reliability concern in modern microprocessors. Redundant hardware solutions are commonly deployed to detect transient faults, but they are less flexible and cost-effective than software solutions. However, software solutions are rendered impractical because of high performance overheads. To address this problem, this paper presents Runtime Asynchronous Fault Tolerance via Speculation (RAFT), the fastest transient fault detection technique known to date. Serving as a layer between the application and the underlying platform, RAFT automatically generates two symmetric program instances from a program binary. It detects transient faults in a non-invasive way and exploits high-confidence value speculation to achieve low runtime overhead. Evaluation on a commodity multicore system demonstrates that RAFT delivers a geomean performance overhead of 2.83% on a set of 30 SPEC CPU benchmarks and STAMP benchmarks. Compared with existing transient fault detection techniques, RAFT exhibits the best performance and fault coverage, without requiring any change to the hardware or the software applications
- …