Both von Neumann's NAND multiplexing, based on a massive duplication of imperfect devices and randomized imperfect interconnects, and reconfigurable architectures have been investigated to come up with solutions for integrations of highly unreliable nanometre-scale devices. In this paper, we review these two techniques, and present a defect-and fault-tolerant architecture in which von Neumann's NAND multiplexing is combined with a massively reconfigurable architecture. The system performance of this architecture is evaluated by studying its reliability, i.e. the probability of system survival. Our evaluation shows that the suggested architecture can tolerate a device error rate of up to 10 −2 , with multiple redundant components; the structure is efficiently robust against both permanent and transient faults for an ultra-large integration of highly unreliable nanometre-scale devices.
Introduction
Nanometre-scale electronics have been greatly developed in recent years. Besides the developments of various single nanoelectronic devices some research has advanced to the logic circuit level, such as single-electron tunnelling (SET) technology [1, 2] , carbon nanotubes [3] , semiconductor nanowires [4] , chemically assembled electronic nanocomputers (CAENs) [5, 6] etc. The very small sizes of nanometrescale devices make it possible to build a trillion (10 12 ) devices in a square centimetre [7] . However, for such a densely integrated circuit to perform a useful computation, it has to deal with the inaccuracies and instabilities introduced by fabrication processes and the tiny devices themselves. Permanent faults may emerge during the manufacturing process, while transient ones may spontaneously occur during computers' lifetimes. Future nanoelectronic architectures have to be able to tolerate an extremely large number of defects and faults. The design of fault-tolerant architectures for the ultra-large integration of highly unreliable nanometre devices is therefore inevitable.
In 1952, von Neumann initiated the study of using redundant components to obtain reliable synthesis from unreliable components, namely, the multiplexing technique [8] . It was then theoretically demonstrated that with an extremely high degree of redundancy, the integration of unreliable logic units could be made reliable. In his construction, von Neumann considered two sets of basic logic circuits, the majority voting and NAND logic, and assumed that they are not completely reliable, i.e., each of them fails with constant probability. By using a bundle of unreliable gates functioning as an ideally reliable one, von Neumann proved that if the failure probabilities of the gates are sufficiently small and the failures are statistically independent, computations may be done reliably with a high probability. However, the construction requires a large number of redundant components, which was seen as a major shortcoming of this method.
In 1965, the work by von Neumann and his contemporaries on fault tolerant logic was generalized by Pierce to a theory termed interwoven redundant logic [9] . In 1977 Dobrushin and Ortyukov theoretically improved von Neumann's result [10] , showing that logarithmic redundancy is actually sufficient for any Boolean functions [10] and, at least for certain Boolean functions, necessary [11] . In the 1980s, it was proven by Pippenger that a variety of Boolean functions may be computed reliably by noisy networks requiring only constant multiplicative redundancy [12, 13] . Hence, it is theoretically demonstrated that the multiplexing technique may work effectively with a practically acceptable redundancy overhead. More recently, von Neumann's NAND multiplexing has been studied as an effective fault-tolerant technique for protection against the increasing transient faults in nanoelectronic circuits [14, 15] , while it is believed to be less efficient against manufacturing defects or permanent faults.
A reconfigurable architecture is a computer architecture which can be configured or programmed after fabrication to implement desired computations. Faulty components are detected during testing and excluded during reconfiguration. Reconfigurable architectures have been investigated as well for the solution of integration of highly unreliable nanometre-scale devices, in particular as defect-tolerant architectures against manufacturing errors. Teramac [16] , built in HP laboratories, is such an extremely defect-tolerant reconfigurable machine. The basic components in Teramac are programmable switches (memory) and redundant interconnections. The high communication bandwidth is critical for both parallel computation and defect tolerance. With about 10% of logic cells and 3% of total resources defective, Teramac could still operate 100 times faster than a high-end singleprocessor workstation for some of its configurations. The embryonics architecture [17] is inspired by the biological growth and operation of all living beings. It is based on four hierarchical levels: a molecule (a multiplexer-based element of a programmable circuit), a cell (a small processor with an associated memory), an organism (an applicationspecific multiprocessor system) and the population of identical organisms. Each cell contains complete sets of instructions, the genomes, which make each cell universal and potentially apt for self-repair and self-replication. The objective of developing highly robust integrated circuits capable of selfrepair and self-replication makes the embryonics architecture a potential paradigm for future nanometre-scale computation systems.
In this paper we seek fault-tolerant architectures for unreliable nanoelectronic devices, by extending the study of NAND multiplexing to a rather low degree of redundancy and implementing it into a massively reconfigurable architecture. The system performance of the architecture is evaluated by studying its reliability, which can be defined as the probability of system survival. Our evaluation shows that the suggested system is efficiently robust against both permanent and transient faults for an ultra-large integration of highly unreliable nanometre-scale devices.
The paper is organized as follows. In section 2 von Neumann's NAND multiplexing technique is briefly reviewed and developed. Section 3 gives the reliability analysis of reconfigurable architectures. In section 4 we present the implementation of a defect-and fault-tolerant architecture based on NAND multiplexing and reconfigurable architectures. Section 5 concludes the paper.
The NAND multiplexing technique

von Neumann's theory
Consider a NAND gate. Replace each input of the NAND gate as well as its output by a bundle of N lines, and duplicate the NAND N times, as shown in figure 1 . The rectangle U is supposed to perform a 'random permutation' of the input signals in the sense that each signal from the first input bundle is randomly paired with a signal from the second input bundle to form the input pair of one of the duplicated NANDs. Let X be the set of lines in the first input bundle being stimulated (a logic TRUE or '1'). Consequently, (N − X) lines are not stimulated (they have the value FALSE or '0'). Let Y be the corresponding set for the second input bundle; and let Z be the corresponding set for the output bundle.
Assume that the failure probability of a NAND gate is a constant ε and assume that the type of fault the NAND makes is that it inverts its output; i.e. acts as an AND gate (a von Neumann fault).
Clearly (x,ȳ,z) are relative levels of excitation of the two input bundles and of the output bundle, respectively. The question is then: what is the distribution of the stochastic variablez in terms of the givenx andȳ? With a large N , von Neumann had concluded thatz is a stochastic variable, approximately normally distributed [8] .
Error distributions in a multiplexing unit
The NAND multiplexing unit was constructed as in figure 1. Von Neumann's theory applies when the bundle size N is large. If N is large, however, the theory is unrealistic in practice because of the huge amount of redundancy. In this section we study the error distributions in a multiplexing unit with a fairly low degree of redundancy.
Let us consider a single NAND gate in the multiplexing scheme. We still assume that there arex N andȳ N input lines stimulated. If the two inputs are independent, the probability of the output of the NAND gate that was found to be nonstimulated (by both stimulated inputs) isr =xȳ (assuming that the NAND gate is fault free). If each NAND gate has a probability ε of making a von Neumann error, the probability of its output being non-stimulated is
for more common fault models stuck-at-0 and stuck-at-1, the probabilities becomer
For each single NAND gate, thus, the probability of the output being non-stimulated (event 0) isr ,r ∈ [r v ,r 0 ,r 1 ], and the probability of being stimulated (event 1) is 1 −r . If the N NAND gates function independently, the probability of exactly k outputs being non-stimulated is given by the binomial distribution
The multi-stage NAND multiplexing system.
If both inputs of the NAND gates are expected to be in stimulated states, the non-stimulated outputs are then considered as reliable ones. If the faulty devices in the multiplexing circuits are independent and uniformly distributed, the formula (4) could be easily used to calculate the output reliability. This may be reasonable when the dominant faults are transient ones. For manufacturing defects and permanent faults, however, the binomial distribution model is not sufficient to describe the actual manufacturing imperfections. The device components are not statistically independent but rather correlated since defects tend to cluster on a chip [18] . The formula (4) is therefore not appropriate for reliability calculation. (Although it is not yet clear what the future nanocomputers will be based on and how they will be built, it might be helpful to learn from present manufacturing processes.)
Variability of the manufacturing defects can be modelled with a continuous probability distribution function f (r ) of estimated component reliability r . Compounding the formula (4) with respect to this distribution function results in
The success of the approach depends on finding appropriate parameters for the formula. Here we follow Stapper's beta distribution model [19] , which gives
where µ is a variable parameter andr is the average or expected single output reliability. The formula calculates the probability that exactly k out of N identical NANDs give reliable outputs. The parameter µ is a measure of the amount of fault clustering. Small values of µ indicate high levels of clustering. As µ approaches infinity the formula becomes the case of independently distributed faults.
Error distributions in a multi-stage system
If the outputs of a NAND multiplexing unit are duplicated as the inputs of the succeeding one, a multi-stage system can be built as depicted in figure 2. In such a system the number of stimulated (or non-stimulated) outputs of each NAND multiplexing stage is actually a stochastic variable; it evolves as a Markov process (chain) because the outputs of one stage are totally determined by the inputs and device error distribution of the same stage. The characteristic of a Markov chain can be described by an initial probability distribution and transition probabilities. If there are k l−1 of the N incoming lines stimulated for both inputs of the lth unit and each NAND gate has a fixed probability ε of making an error, according to formula (6), the probability of having k l non-stimulated outputs in the case of the corresponding k l−1 stimulated inputs is given by
wherer (k l−1 ) is a variation of equation (1), (2) or (3) with
If we are interested in the outputs which give faulty signals, then the probability of having k l stimulated outputs, i.e., k l = N − k l , is given by
Noting the stochastic nature of k l−1 , the probability of k l outputs being stimulated in all cases is obtained by
Formula (9) is inductive in the sense that, given an initial probability distribution and conditional probabilities, the output probability at any stage can be obtained. In the Markov chain an (N + 1) × (N + 1) transition probability matrix Ψ, whose elements are
, can be made as (10) , so that all conditional probabilities for any set of (k l , k l−1 ) are included.
(10) Since the transition probability matrix Ψ for each stage is identical and irrelevant with regard to l, this is a homogeneous Markov chain. With the transition probability matrix and a fixed input distribution,
where p i is the probability of i inputs being stimulated, the stimulated output distribution of a NAND multiplexing system with n stages is
When n gets large, Ψ n approaches a constant matrix π, i.e., lim
This indicates that, as n becomes extremely large, the system output distribution will become stable and independent of the number of multiplexing stages.
Reliability analysis of reconfigurable architecture
The key idea behind reconfigurable architectures is that the defects due to manufacture can be detected, located and then avoided. The reconfigurable computer concept is greatly assisted by the use of field programmable gate arrays (FPGAs) [20] . Fundamentally an FPGA contains a regular array of logic units, which are called configurable logic blocks (CLBs). Each CLB can communicate with its neighbours, and the CLBs are further grouped in blocks, then clusters of blocks. The CLBs can be individually reprogrammed so that a wide variety of logic or memory structures can be mapped onto the array of CLBs. When a part (or all) of a CLB is not working, the defective components are easy to locate and exclude from the working components. The Teramac machine [16] , as a successful example of the reconfiguration concept, uses 864 identical FPGA chips, among which 75% (647) are partially defective. The first task of Teramac after it was built was to run self-diagnostic software, by which the defects were detected and located, and a defect database was generated. By reading the database, user applications are mapped onto good resources. Teramac 'has been successfully configured into a number of parallel architectures and used for extremely demanding computations'. In processor arrays, the basic logic circuit blocks are referred to as processing elements (PEs), which are sometimes associated with local memories. In very large chips, the reliability can be improved by adding spare PEs to the design. Clearly, the more spares are added, the higher the resulting reliability will be. Instead of trying to achieve complete fault tolerance, defined as survival of a number of faults equal to the number of spares, most research aims at optimizing probability of survival, defined as the percentage of fault configurations that can be successfully overcome by the reconfiguration approach [21] . Reconfiguration approaches may be local or global. In local approaches, arrays are divided into subarrays. Spare elements are added to each individual subarray and reconfiguration is performed internally to each subarray. In global approaches, a set of spare elements is added to the whole array (usually as spare rows and columns along the edges of the array). Global approaches usually involve far more complex reconfiguration algorithms than local solutions [21] .
For simplicity, we refer to logic blocks, clusters or PEs as modules and assume that all modules in the array are identical, so that any spare module can substitute any failed one, provided there exists a sufficient number of interconnection paths. If in an array there are n identical modules, out of which r are spares, then at least n − r must be fault free for proper operation. We define R mn as the probability of exactly m out of the n modules being fault free; then the reliability of the array is given by
If each module has the same failure rate, or the same reliability R 0 , and modules are statistically independent, we obtain the following binomial probability for the number of fault-free modules m:
Once again the defective modules in an array are not uniformly distributed but rather correlated, therefore the binomial distribution formula (15) is not sufficient for reliability evaluation. Stapper's model can be used to improve the reliability calculation of correlated modules [19] :
where µ is a variable parameter indicating the amount of fault clustering andR 0 is the average or expected single-module reliability.
The formula calculates the probability that exactly m out of n identical modules operate correctly. It can be applied to the reliability analysis of parallel processors with redundancy and fault-tolerant very large scale integration (VLSI) systems.
The defect-and fault-tolerant architecture
The basic circuits implemented with NAND multiplexing
Within a digital computer, the bulk of the logic gates is spent on memory and caches. The processor itself is made from a number of functional units, each of which can be separated into function blocks. Let us assume that the function block on the most refined level evaluates its inputs and produces a stable output within one clock cycle. Within this function block, many logic circuits may be cascaded; however, to avoid timing problems (hazard) usually the number of circuits cascaded and hence the possible paths from inputs to outputs through the various logic circuits is kept within bounds, and hence their path lengths are similar. Such function blocks are found everywhere in the processor and in memory. The function blocks or processors can be composed of arithmetic and logic units (ALUs), look-up tables (memories) or simply multiplexers. The fact that a NAND gate is a universal logic device makes it possible to use the NAND multiplexing technique on any logic operations, even though the NAND multiplexing can be easily extended to other specific logics. In this section we make an abstraction of such a function block and assume, to be able to make a statistical analysis, that it is made entirely out of stages of parallel NAND gates.
If the processors are implemented with the NAND multiplexing, then the obtained structure will be a NAND multiplexing system with a redundancy factor of N , as all the components are duplicated N times. The performance of a multiplexing system can be evaluated by investigating the probability that the number of faulty outputs is or is not beyond a threshold level. In other words those outputs with errors less than this threshold will be seen as reliable and their complementaries will be unreliable. The threshold level, together with N , may have an impact on the maximum tolerable value of the device failure rate ε. Since we are concerned with the minimum redundancy required to achieve fault tolerance, we take N = 3 and then the threshold 1 3 . If the redundancy is as small as N = 3, the random interconnections of logic layers can be substituted in practical implementations by systematic ones which have specific routes. Systematic interconnections are even likely to be superior in terms of error correction to random ones [9] . Since the NAND multiplexing uses redundant components to mask the effect of defective ones (no error detection is needed), the system is actually capable of tolerating both permanent and transient faults upon their occurrences.
If each processor has a logical depth of 11, which is sufficient for general computation tasks, then the reliability of the one-bit NAND multiplexing output after 11 stages can be studied with various device error rates ε, using the NAND multiplexing theory. With perfectly fault-free inputs, the probability distributions of output errors (unreliability distribution) are computed against the error rate of individual NANDs with µ = 1, 2, 5, 20 and infinity, shown in tables 1, 2 and 3 for different fault models.
From the tables we can see that the von Neumann fault model brings the largest system performance degradation while the influence from µ, i.e., the amount of fault clustering, is insignificant. Since we are interested in the maximum device error rate that can be tolerated in general, we will take ε = 10 −2 and µ = 5; then the reliability of the one-bit output can be obtained from table 1 as R 0 = 0.9012. In the following section we will take R 0 = 0.9012 as the average reliability of a one-bit multiplexing circuit.
The reconfigurable structures on bit, processor and cluster levels
We further assume that each processor has 32-bit processing capacity. For a 32-bit processor, if no redundant circuits are applied, it is only reliable when all of the output bits are reliable. Instead of exactly making a 32-bit circuit we build in the processor some redundant processing circuits, so that the spares can be configured to replace defective ones ( figure 3(c) ). If each one-bit circuit has a similar structure, the reliability of the 32-bit processor with redundant circuits can be evaluated against the number of spare bit circuits, using formulae (14) and (16) , as plotted in figure 4 . The left-most data in the figure indicate the reliability of a processor with no redundancy. The effect of the variability parameter µ here is rather significant. The improvement of reliability by using redundancy is explicit, in particular when µ is large. Assuming that errors are not strongly correlated in processor and upper levels, we take µ = 20 for further evaluations. Thus a processor with 16 redundant bit circuits will have a reliability of 0.9927.
The development of nanotechnology makes it eventually possible to realize extremely large scales of integration, of the order of 10 12 devices per chip. If on such a chip each processor has about 10 6 devices (logic, memory, communications etc), the number of processors on the chip will be about 10 6 (2 10 × 2 10 ). Instead of being connected globally, the processors can be assembled into 1024 (2 10 ) processing clusters, each containing 1024 (2 10 ) processors and calculating tasks independently. The clusters further compose the chip. Both clusters and processors can be connected in a two-dimensional (32 × 32) array, in which some columns are redundant, as in figures 3(a) and (b). The reconfigurable strategy therefore can be implemented on both cluster and processor levels.
Similarly, the performance of a cluster (chip) with redundant processors (clusters) can be evaluated using the formulae (14) and (16) . The reliability of a cluster against the number of spare columns is plotted as figure 5 , with µ = 20. The left-most point indicates the reliability of a cluster with no spares. By using four columns of processors as redundant, the reliability of the cluster is elevated from 0.5589 to 0.9959, i.e., a cluster having 128 (4 × 32) redundant processors has a reliability of 0.9959. Further, the reliability of a chip with 1024 clusters is plotted against the number of spare columns of clusters with µ = 20 as figure 6 . It is shown that the reliability of a chip will be greatly improved by using redundant components. If 128 of the 1024 total clusters (four out of 32 Table 1 . Output unreliabilities of a one-bit NAND multiplexing circuit (von Neumann fault). Table 2 . Output unreliabilities of a one-bit NAND multiplexing circuit (stuck-at-0 fault).
7.447 × 10 Table 3 . Output unreliabilities of a one-bit NAND multiplexing circuit (stuck-at-1 fault).
4.627 × 10 columns) are used as spare ones, then the reliability of the chip will be better than 99% (with a failure rate of 0.2%), provided that faulty components can be effectively substituted by spare ones. In summary, we have discussed a set-up of a massively parallel fault-tolerant computer architecture. The NAND multiplexing technique is implemented in the fundamental circuits and reconfigurable structures are mapped to the bit, processor and cluster level. Containing up to 10 12 devices, the conceived chip can have about 10 6 medium-sized processors and tolerate a device error rate up to 10 −2 , which is generally unacceptable for any current VLSI systems. Redundant components are used at various levels and, as our evaluation shows, they are critical for the survival of the architecture. In contrast with [15] , where plain NAND multiplexing was used to recover from transient errors, resulting in a massive redundancy, we now accept a higher error rate on the lowest level with considerably less redundancy, but compensate for this using a hierarchical reconfigurability. This leads to an acceptable failure rate for transient errors for the entire system (online error detection might be needed), and simultaneously forms a protection against permanent defects. The error detection problem remains open for further research. The system is expected to have a total redundancy factor × (the fraction of other necessary spare components)) <10. This indicates that future nanochips with 10 12 devices might be working at an acceptable reliability level, virtually having about 10 11 effective devices.
Conclusions
Due to the manufacturing process, the shrinking of electronic devices will inevitably introduce a growing number of defects and even make these devices more sensitive to external influences such as cosmic radiation, electromagnetic interference, thermal fluctuations etc. It is therefore likely that the emerging nanometre-scale devices will eventually suffer from more errors than classical silicon devices in large scale integrated circuits. In order to make future systems based on nanometre-scale devices reliable, the design of fault-tolerant architectures will be necessary. This paper can be seen as a part of the endeavour devoted to this work. We have presented a defect-and fault-tolerant architecture, in which von Neumann's NAND multiplexing is implemented in basic circuits and reconfigurable architectures are mapped to the overall system. The system is expected to be working at an acceptable reliability level at the expense of having multiple redundant components. The architecture is potentially effective in protection against both permanent defects and transient faults for systems based on unreliable nanometre-scale devices.
