28,094 research outputs found
Gate Delay Fault Test Generation for Non-Scan Circuits
This article presents a technique for the extension of delay fault test pattern generation to synchronous sequential circuits without making use of scan techniques. The technique relies on the coupling of TDgen, a robust combinational test pattern generator for delay faults, and SEMILET, a sequential test pattern generator for several static fault models. The approach uses a forward propagation-backward justification technique: The test pattern generation is started at the fault location, and after successful ÂżlocalÂż test generation fault effect propagation is performed and finally a synchronising sequence to the required state is computed. The algorithm is complete for a robust gate delay fault model, which means that for every testable fault a test will be generated, assuming sufficient time. Experimental results for the ISCAS'89 benchmarks are presented in this pape
Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1
Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified
On-Line Dependability Enhancement of Multiprocessor SoCs by Resource Management
This paper describes a new approach towards dependable design of homogeneous multi-processor SoCs in an example satellite-navigation application. First, the NoC dependability is functionally verified via embedded software. Then the Xentium processor tiles are periodically verified via on-line self-testing techniques, by using a new IIP Dependability Manager. Based on the Dependability Manager results, faulty tiles are electronically excluded and replaced by fault-free spare tiles via on-line resource management. This integrated approach enables fast electronic fault detection/diagnosis and repair, and hence a high system availability. The dependability application runs in parallel with the actual application, resulting in a very dependable system. All parts have been verified by simulation
System Description for a Scalable, Fault-Tolerant, Distributed Garbage Collector
We describe an efficient and fault-tolerant algorithm for distributed cyclic
garbage collection. The algorithm imposes few requirements on the local
machines and allows for flexibility in the choice of local collector and
distributed acyclic garbage collector to use with it. We have emphasized
reducing the number and size of network messages without sacrificing the
promptness of collection throughout the algorithm. Our proposed collector is a
variant of back tracing to avoid extensive synchronization between machines. We
have added an explicit forward tracing stage to the standard back tracing stage
and designed a tuned heuristic to reduce the total amount of work done by the
collector. Of particular note is the development of fault-tolerant cooperation
between traces and a heuristic that aggressively reduces the set of suspect
objects.Comment: 47 pages, LaTe
Unattended network operations technology assessment study. Technical support for defining advanced satellite systems concepts
The results are summarized of an unattended network operations technology assessment study for the Space Exploration Initiative (SEI). The scope of the work included: (1) identified possible enhancements due to the proposed Mars communications network; (2) identified network operations on Mars; (3) performed a technology assessment of possible supporting technologies based on current and future approaches to network operations; and (4) developed a plan for the testing and development of these technologies. The most important results obtained are as follows: (1) addition of a third Mars Relay Satellite (MRS) and MRS cross link capabilities will enhance the network's fault tolerance capabilities through improved connectivity; (2) network functions can be divided into the six basic ISO network functional groups; (3) distributed artificial intelligence technologies will augment more traditional network management technologies to form the technological infrastructure of a virtually unattended network; and (4) a great effort is required to bring the current network technology levels for manned space communications up to the level needed for an automated fault tolerance Mars communications network
Reliable Communication in a Dynamic Network in the Presence of Byzantine Faults
We consider the following problem: two nodes want to reliably communicate in
a dynamic multihop network where some nodes have been compromised, and may have
a totally arbitrary and unpredictable behavior. These nodes are called
Byzantine. We consider the two cases where cryptography is available and not
available. We prove the necessary and sufficient condition (that is, the
weakest possible condition) to ensure reliable communication in this context.
Our proof is constructive, as we provide Byzantine-resilient algorithms for
reliable communication that are optimal with respect to our impossibility
results. In a second part, we investigate the impact of our conditions in three
case studies: participants interacting in a conference, robots moving on a grid
and agents in the subway. Our simulations indicate a clear benefit of using our
algorithms for reliable communication in those contexts
Overlay Protection Against Link Failures Using Network Coding
This paper introduces a network coding-based protection scheme against single
and multiple link failures. The proposed strategy ensures that in a connection,
each node receives two copies of the same data unit: one copy on the working
circuit, and a second copy that can be extracted from linear combinations of
data units transmitted on a shared protection path. This guarantees
instantaneous recovery of data units upon the failure of a working circuit. The
strategy can be implemented at an overlay layer, which makes its deployment
simple and scalable. While the proposed strategy is similar in spirit to the
work of Kamal '07 & '10, there are significant differences. In particular, it
provides protection against multiple link failures. The new scheme is simpler,
less expensive, and does not require the synchronization required by the
original scheme. The sharing of the protection circuit by a number of
connections is the key to the reduction of the cost of protection. The paper
also conducts a comparison of the cost of the proposed scheme to the 1+1 and
shared backup path protection (SBPP) strategies, and establishes the benefits
of our strategy.Comment: 14 pages, 10 figures, accepted by IEEE/ACM Transactions on Networkin
Simple and Optimal Randomized Fault-Tolerant Rumor Spreading
We revisit the classic problem of spreading a piece of information in a group
of fully connected processors. By suitably adding a small dose of
randomness to the protocol of Gasienic and Pelc (1996), we derive for the first
time protocols that (i) use a linear number of messages, (ii) are correct even
when an arbitrary number of adversarially chosen processors does not
participate in the process, and (iii) with high probability have the
asymptotically optimal runtime of when at least an arbitrarily
small constant fraction of the processors are working. In addition, our
protocols do not require that the system is synchronized nor that all
processors are simultaneously woken up at time zero, they are fully based on
push-operations, and they do not need an a priori estimate on the number of
failed nodes.
Our protocols thus overcome the typical disadvantages of the two known
approaches, algorithms based on random gossip (typically needing a large number
of messages due to their unorganized nature) and algorithms based on fair
workload splitting (which are either not {time-efficient} or require intricate
preprocessing steps plus synchronization).Comment: This is the author-generated version of a paper which is to appear in
Distributed Computing, Springer, DOI: 10.1007/s00446-014-0238-z It is
available online from
http://link.springer.com/article/10.1007/s00446-014-0238-z This version
contains some new results (Section 6
- âŠ