527 research outputs found
A study of the selection of microcomputer architectures to automate planetary spacecraft power systems
Performance and reliability models of alternate microcomputer architectures as a methodology for optimizing system design were examined. A methodology for selecting an optimum microcomputer architecture for autonomous operation of planetary spacecraft power systems was developed. Various microcomputer system architectures are analyzed to determine their application to spacecraft power systems. It is suggested that no standardization formula or common set of guidelines exists which provides an optimum configuration for a given set of specifications
Arithmetic on a Distributed-Memory Quantum Multicomputer
We evaluate the performance of quantum arithmetic algorithms run on a
distributed quantum computer (a quantum multicomputer). We vary the node
capacity and I/O capabilities, and the network topology. The tradeoff of
choosing between gates executed remotely, through ``teleported gates'' on
entangled pairs of qubits (telegate), versus exchanging the relevant qubits via
quantum teleportation, then executing the algorithm using local gates
(teledata), is examined. We show that the teledata approach performs better,
and that carry-ripple adders perform well when the teleportation block is
decomposed so that the key quantum operations can be parallelized. A node size
of only a few logical qubits performs adequately provided that the nodes have
two transceiver qubits. A linear network topology performs acceptably for a
broad range of system sizes and performance parameters. We therefore recommend
pursuing small, high-I/O bandwidth nodes and a simple network. Such a machine
will run Shor's algorithm for factoring large numbers efficiently.Comment: 24 pages, 10 figures, ACM transactions format. Extended version of
Int. Symp. on Comp. Architecture (ISCA) paper; v2, correct one circuit error,
numerous small changes for clarity, add reference
New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance
Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms
Experimental analysis of computer system dependability
This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance
Performance modeling of fault-tolerant circuit-switched communication networks
Circuit switching (CS) has been suggested as an efficient switching method for supporting simultaneous communications (such as data, voice, and images) across parallel systems due to its ability to preserve both communication performance and fault-tolerant demands in such systems. In this paper we present an efficient scheme to capture the mean message latency in 2D torus with CS in the presence of faulty components. We have also conducted extensive simulation experiments, the results of which are used to validate the analytical mode
A Performance Prediction Model for a Fault-Tolerant Computer During Recovery and Restoration
The modeling and design of a fault-tolerant multiprocessor system is addressed in this dissertation. In particular, the behavior of the system during recovery and restoration after a fault has occurred is investigated. Given that a multicomputer system is designed using the Algorithm to Architecture To Mapping Model (ATAMM) model, and that a fault (death of a computing resource) occurs during its normal steady-state operation, a model is presented as a viable research tool for predicting the performance bounds of the system during its recovery and restoration phases. Furthermore, the bounds of the performance behavior of the system during this transient mode can be assessed. These bounds include: time recover from the fault (trec), time to restore the system (tres} and whether there is a permanent delay in the system\u27s Time Between Input and Output (TBIO) after the system has reached a steady state. An implementation of an ATAMM based computer was developed with the Generic VHSIC Spaceborne Computer (GVSC) as the target system. A simulation of the GVSC was also written based on the code used in ATAMM Multicomputer Operating System (AMOS). The simulation is in turn used to validate the new model in the usefulness and accuracy in tracking the propagation of the delay through the system and predicting the behavior in the transient state of recovery and restoration. The model is validated as an accurate method to predict the transient behavior of an ATAMM based multicomputer during recovery and restoration
A Performance Prediction Model for a Fault-Tolerant Computer During Recovery and Restoration
The modeling and design of a fault-tolerant multiprocessor system is addressed. In particular, the behavior of the system during recovery and restoration after a fault has occurred is investigated. Given that a multicomputer system is designed using the Algorithm to Architecture to Mapping Model (ATAMM), and that a fault (death of a computing resource) occurs during its normal steady-state operation, a model is presented as a viable research tool for predicting the performance bounds of the system during its recovery and restoration phases. Furthermore, the bounds of the performance behavior of the system during this transient mode can be assessed. These bounds include: time to recover from the fault (t(sub rec)), time to restore the system (t(sub rec)) and whether there is a permanent delay in the system's Time Between Input and Output (TBIO) after the system has reached a steady state. An implementation of an ATAMM based computer was developed with the Generic VHSIC Spaceborne Computer (GVSC) as the target system. A simulation of the GVSC was also written based on the code used in ATAMM Multicomputer Operating System (AMOS). The simulation is in turn used to validate the new model in the usefulness and accuracy in tracking the propagation of the delay through the system and predicting the behavior in the transient state of recovery and restoration. The model is validated as an accurate method to predict the transient behavior of an ATAMM based multicomputer during recovery and restoration
Middleware Fault Tolerance Support for the BOSS Embedded Operating System
Critical embedded systems need a dependable operating system and application. Despite all efforts to prevent and remove faults in system development, residual software faults usually persist. Therefore, critical systems need some sort of fault tolerance to deal with these faults and also with hardware faults at operation time.
This work proposes fault-tolerant support mechanisms for the BOSS embedded operating system, based on the application of proven fault tolerance strategies by middleware control software which transparently delivers the added functionality to the application software. Special attention is taken to complexity control and resource constraints, targeting the needs of the embedded market.Fundação para a Ciência e a Tecnologia (FCT
Submicron Systems Architecture Project : Semiannual Technical Report
The Mosaic C is an experimental fine-grain multicomputer
based on single-chip nodes. The Mosaic C chip includes 64KB of fast dynamic RAM,
processor, packet interface, ROM for bootstrap and self-test, and a two-dimensional selftimed
router. The chip architecture provides low-overhead and low-latency handling of
message packets, and high memory and network bandwidth. Sixty-four Mosaic chips are
packaged by tape-automated bonding (TAB) in an 8 x 8 array on circuit boards that can, in
turn, be arrayed in two dimensions to build arbitrarily large machines. These 8 x 8 boards are
now in prototype production under a subcontract with Hewlett-Packard. We are planning
to construct a 16K-node Mosaic C system from 256 of these boards. The suite of Mosaic
C hardware also includes host-interface boards and high-speed communication cables. The
hardware developments and activities of the past eight months are described in section 2.1.
The programming system that we are developing for the Mosaic C is based on the
same message-passing, reactive-process, computational model that we have used with earlier
multicomputers, but the model is implemented for the Mosaic in a way that supports finegrain
concurrency. A process executes only in response to receiving a message, and may in
execution send messages, create new processes, and modify its persistent variables before
it either exits or becomes dormant in preparation for receiving another message. These
computations are expressed in an object-oriented programming notation, a derivative of
C++ called C+-. The computational model and the C+- programming notation are
described in section 2.2. The Mosaic C runtime system, which is written in C+-, provides
automatic process placement and highly distributed management of system resources. The
Mosaic C runtime system is described in section 2.3
- …