1,573 research outputs found
Recommended from our members
Error-efficient computing systems
This survey explores the theory and practice of techniques to make computing systems faster or more energy-efficient by allowing them to make controlled errors. In the same way that systems which only use as much energy as necessary are referred to as being energy-efficient, you can think of the class of systems addressed by this survey as being error-efficient: They only prevent as many errors as they need to. The definition of what constitutes an error varies across the parts of a system. And the errors which are acceptable depend on the application at hand. In computing systems, making errors, when behaving correctly would be too expensive, can conserve resources. The resources conserved may be time: By making some errors, systems may be faster. The resource may also be energy: A system may use less power from its batteries or from the electrical grid by only avoiding certain errors while tolerating benign errors that are associated with reduced power consumption. The resource in question may be an even more abstract quantity such as consistency of ordering of the outputs of a system. This survey is for anyone interested in an end-to-end view of one set of techniques that address the theory and practice of making computing systems more efficient by trading errors for improved efficiency
Predicting Cost/Reliability/Maintainability of Advanced General Aviation Avionics Equipment
A methodology is provided for assisting NASA in estimating the cost, reliability, and maintenance (CRM) requirements for general avionics equipment operating in the 1980's. Practical problems of predicting these factors are examined. The usefulness and short comings of different approaches for modeling coast and reliability estimates are discussed together with special problems caused by the lack of historical data on the cost of maintaining general aviation avionics. Suggestions are offered on how NASA might proceed in assessing cost reliability CRM implications in the absence of reliable generalized predictive models
Resilience of an embedded architecture using hardware redundancy
In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound.
Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures.
In this work we study the concepts of fault tolerance and dependability and
extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators
Analysis of spacecraft anomalies
The anomalies from 316 spacecraft covering the entire U.S. space program were analyzed to determine if there were any experimental or technological programs which could be implemented to remove the anomalies from future space activity. Thirty specific categories of anomalies were found to cover nearly 85 percent of all observed anomalies. Thirteen experiments were defined to deal with 17 of these categories; nine additional experiments were identified to deal with other classes of observed and anticipated anomalies. Preliminary analyses indicate that all 22 experimental programs are both technically feasible and economically viable
Testability and redundancy techniques for improved yield and reliability of CMOS VLSI circuits
The research presented in this thesis is concerned with the design of fault-tolerant integrated circuits as a contribution to the design of fault-tolerant systems. The economical manufacture of very large area ICs will necessitate the incorporation of fault-tolerance features which are routinely employed in current high density dynamic random access memories. Furthermore, the growing use of ICs in safety-critical applications and/or hostile environments in addition to the prospect of single-chip systems will mandate the use of fault-tolerance for improved reliability. A fault-tolerant IC must be able to detect and correct all possible faults that may affect its operation. The ability of a chip to detect its own faults is not only necessary for fault-tolerance, but it is also regarded as the ultimate solution to the problem of testing. Off-line periodic testing is selected for this research because it achieves better coverage of physical faults and it requires less extra hardware than on-line error detection techniques. Tests for CMOS stuck-open faults are shown to detect all other faults. Simple test sequence generation procedures for the detection of all faults are derived. The test sequences generated by these procedures produce a trivial output, thereby, greatly simplifying the task of test response analysis. A further advantage of the proposed test generation procedures is that they do not require the enumeration of faults. The implementation of built-in self-test is considered and it is shown that the hardware overhead is comparable to that associated with pseudo-random and pseudo-exhaustive techniques while achieving a much higher fault coverage through-the use of the proposed test generation procedures. The consideration of the problem of testing the test circuitry led to the conclusion that complete test coverage may be achieved if separate chips cooperate in testing each other's untested parts. An alternative approach towards complete test coverage would be to design the test circuitry so that it is as distributed as possible and so that it is tested as it performs its function. Fault correction relies on the provision of spare units and a means of reconfiguring the circuit so that the faulty units are discarded. This raises the question of what is the optimum size of a unit? A mathematical model, linking yield and reliability is therefore developed to answer such a question and also to study the effects of such parameters as the amount of redundancy, the size of the additional circuitry required for testing and reconfiguration, and the effect of periodic testing on reliability. The stringent requirement on the size of the reconfiguration logic is illustrated by the application of the model to a typical example. Another important result concerns the effect of periodic testing on reliability. It is shown that periodic off-line testing can achieve approximately the same level of reliability as on-line testing, even when the time between tests is many hundreds of hours
Fault-Tolerant Computing: An Overview
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNASA / NAG-1-613Semiconductor Research Corporation / 90-DP-109Joint Services Electronics Program / N00014-90-J-127
Fault-tolerant computer study
A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed
Neuromorphic Computing with Resistive Switching Devices.
Resistive switches, commonly referred to as resistive memory (RRAM) devices and modeled as memristors, are an emerging nanoscale technology that can revolutionize data storage and computing approaches. Enabled by the advancement of nanoscale semiconductor fabrication and detailed understanding of the physical and chemical processes occurring at the atomic scale, resistive switches offer high speed, low-power, and extremely dense nonvolatile data storage. Further, the analog capabilities of resistive switching devices enables neuromorphic computing approaches which can achieve massively parallel computation with a power and area budget that is orders of magnitude lower than today’s conventional, digital approaches.
This dissertation presents the investigation of tungsten oxide based resistive switching devices for use in neuromorphic computing applications. Device structure, fabrication, and integration are described and physical models are developed to describe the behavior of the devices. These models are used to develop array-scale simulations in support of neuromorphic computing approaches. Several signal processing algorithms are adapted for acceleration using arrays of resistive switches. Both simulation and experimental results are reported. Finally, guiding principles and proposals for future work are discussed.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116743/1/sheridp_1.pd
- …