50 research outputs found

    Measurement-Based Analysis of Multiple Errors and Near-Coincident Fault Discovery in a Shared Memory Multiprocessor

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Aeronautics and Space Administration / NASA NAG-1-61

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    A study of the relationship between the performance and dependability of a fault-tolerant computer

    Get PDF
    This thesis studies the relationship by creating a tool (FTAPE) that integrates a high stress workload generator with fault injection and by using the tool to evaluate system performance under error conditions. The workloads are comprised of processes which are formed from atomic components that represent CPU, memory, and I/O activity. The fault injector is software-implemented and is capable of injecting any memory addressable location, including special registers and caches. This tool has been used to study a Tandem Integrity S2 Computer. Workloads with varying numbers of processes and varying compositions of CPU, memory, and I/O activity are first characterized in terms of performance. Then faults are injected into these workloads. The results show that as the number of concurrent processes increases, the mean fault latency initially increases due to increased contention for the CPU. However, for even higher numbers of processes (less than 3 processes), the mean latency decreases because long latency faults are paged out before they can be activated

    The exploitation of parallelism on shared memory multiprocessors

    Get PDF
    PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

    Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis

    Get PDF
    Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions

    Fast algorithm for real-time rings reconstruction

    Get PDF
    The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of μs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

    COBE's search for structure in the Big Bang

    Get PDF
    The launch of Cosmic Background Explorer (COBE) and the definition of Earth Observing System (EOS) are two of the major events at NASA-Goddard. The three experiments contained in COBE (Differential Microwave Radiometer (DMR), Far Infrared Absolute Spectrophotometer (FIRAS), and Diffuse Infrared Background Experiment (DIRBE)) are very important in measuring the big bang. DMR measures the isotropy of the cosmic background (direction of the radiation). FIRAS looks at the spectrum over the whole sky, searching for deviations, and DIRBE operates in the infrared part of the spectrum gathering evidence of the earliest galaxy formation. By special techniques, the radiation coming from the solar system will be distinguished from that of extragalactic origin. Unique graphics will be used to represent the temperature of the emitting material. A cosmic event will be modeled of such importance that it will affect cosmological theory for generations to come. EOS will monitor changes in the Earth's geophysics during a whole solar color cycle

    Resilience of an embedded architecture using hardware redundancy

    Get PDF
    In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound. Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures. In this work we study the concepts of fault tolerance and dependability and extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators

    Aeronautical engineering: A continuing bibliography with indexes (supplement 195)

    Get PDF
    This bibliography lists 389 reports, articles and other documents introduced into the NASA scientific and technical information system in December 1985
    corecore