As current silicon-based techniques fast approach their practical limits, the investigation of nanoscale electronics, devices and system architcctures becomes a central research priority. It is expected that nanoarchitectures will confront devices and interconnections with high inherent defect rates, which motivates the search for new architectural paradigms.
INTRODUCTION
As current silicon-based techniques fast approach their practical limits, developing nanoscale electronics, devices and system architectures becomes more important [13]. For instance, because of fundamental limitations at the nanoscale level, the past approach of using global interconnections and assuming error-free computing may no longer be possible, thereby presenting new design challenges to computer engineers. It is likely that nanoscale computing will he dominated by communication, where processing is based on redundant and adaptive pathways of error-prone connections. This framework will drive computer architecture in the direction of locally-connected, self-organizing hardware meshes that merge processing and memory [I] . In this section we provide a brief overview of vanous nanodevices currently being developed, their fault characteristics in the context of large networks of devices, and some proposed nanoarchitectures that take into account these characteristics in their design.
Nanodeviees
A wide spectrum of nanoscale devices are being developed based on atomic and molecular phenomena. One promising structure is the carbon nanotube (CNT). Carbon nanotubes are excellent conductors and can provide wire5 for device interconnection. At the same time. CNT structures for diodes and field effect transistors have been demonstrated [ZO] . 1141 . CNTs can form a crosspoint This work was suppaned in pm by NSF grantnumber CCR-0304284.
Permission to makc digital or hard copies ofall or part of this work far personal or classmom usc is granted without fee provided thaf copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the lint page. To copy ofhenvise, to republish, to post on sewers or 10 redistributc to lists, switching structure as shown in Figure 1 . In this device, an attractive electric field between the tubes causes the tube on the right to bend and come in contact with the tube underneath. Molecular forces between the tubes maintain the contact for an extended period of time, and thus provide a mechanism for memory [161. A major issue is the physical placement of CNTs at the nano scale. One promising approach is to use the self-organizing properties of an alumina substrate to fabricate arrays of CNTs. Quantum dot cellular automata (QCA) are based on the local interaction of quantum dots arranged in cells that enforce consistent mutual stales. Logic operations can be achieved by encoding the function into spatial pattems of the cells. Information can also be propagated through chains of QCA devices in a shift register mechanism to provide interconnection between logic blocks. A 480 requires prior specific permission andlor a fee.major drawback to current QCA devices is that they require temperatures well below 1 degree Kelvin for successful operation.
Single electron devices have also been proposed and demonstrated. These devices operate on the principle of coulomb forces when small, isolated electrodes are occupied by an electron. The presence of the electron prevents tunneling of additional electrons and thus forms the basis for memory and switching 11 21. At present these devices have to operate at sub-1 degree Kelvin temperatures.
However, if electrode sizes on the order of Inm can he achieved, then operation at room temperature becomes feasible.
Perhaps the most practical near-term devices are silicon nanowires which can be fabricated to widths on the order of IOnm. The wires exhibit semiconductor properties and so wires in contact can form PN junctions. In addition, crossing nanowires can operate as a FET device 191, making it feasible to fabricate classic silicon circuits
Anticipated characteristics
To date, the fabrication of nanocircuits has been limited to a few devices intended to demonstrate simple logic or memory operations. There are no actual data to measure the characteristics of large networks of devices. However it is possible to pose two likely characteristics that will have to be confronted in the development of computational architectures that use these devices.
1.
A high and dynamic failure process: It can be expected that a significant fraction of the devices and their interconnections will fail. These failures will occur both during fahrication and at a steady rate after fabrication, thus precluding a single test and repair strategy.
2. Operation near the thermal limit As device sizes shrink, the energy difference between logic stales will approach the thermal limit. T h u , the very nature of computation will have to be probabilistic in nature, reflecting the uncertainty inherent in thermodynamics.
The first characteristic is a simple extrapolation of a n e n t device fabrication experience. The smaller the device dimensions become, the more phenomena can interfere with correct operation.
It seems likely that architectures will have to cope with device and connection failure rates of 10% or more. At the same time, the nature of connections and devices will be based on mechanisms that can easily mutate over time, such as chemical reactions or fusing and bridging of connections. Tne second conclusion can he arrived at in several ways. Perhaps the most direct is to consider the evolution of current CMOS technology towards smaller and smaller device sizes, perhaps using silicon nanowire devices. Cunent processor chips have about 100 million transistors and dissipate over 100 Watts. It can be assumed that this power level is already near the practical limit in terms of battery life and dangerous external temperatures. The natural evolutionary forces that drive the number of gates per chip and clock rate upward will decrease logic transition energy limits to within a few orders of magnitude of kbT. I Another approach to the same conclusian is the study of the ultimate limit on the energy required for computation. starting with the paradox of Maxwell's demon [21 and ultimately clarified by the development of information theory. It can be shown that the energy cost of computation cannot be reduced below (In2)lcaT 'k) is Bollm"n'r constanl. 1.38 x 10-23J/'K. and T is absolule temperature in OK. At room tempnature, kaT = 0.026 e l e~u~n -~o t l~.
per bit. This basic result is derived from the necessary increase in randomness as information is lost during computation. For example. the input to an AND gate has four possible states while the output has only two. The evolution of device technology will relentlessly drive towards this limit, requiring approaches that can confront randomness in logic state.
Architectural approaches
It will he necessary to evolve new architectural concepts in order to cope with the high level of device failure and randomness of logic states anticipated by the reasoning just proposed. Current architectural studies aimed at nanoscale systems are focused p r i m d y on the first issue, device failure.
There are two basic approaches being proposed to deal with significant device failure rates: testing and routing around failures; and designing with redundant logic in the form of error correction. Illustrative of the first approach are the architectures of DeHon (51 and Goldstein and Budiu 171. They propose designing in extra circuit elements that can be used to supplement failed devices and connections. A major issue is the testability of the network and the ability to confront continuous failures over the life of the device.
The second approach was suggested in the pioneering work by
Von Neumann [I81 where he used majority logic gates as a primitive building block and randomizing networks to prevent clusters of failures from overwhelming the fault tolerance of the majority logic. The architecture proposed in 181 is based on this approach.
One architectural approach that can provide continuous adaptation to errors is based on neural network structures 161. The synaptic weights ofthe neural network are implemented using multiple connection paths and the summation i s provided by conventional CMOS differential amplifier nodes. The connections are adaptively configured using single electron switching devices. It is proposed that useful computation could arise by training the network with a series of required input-output pairs.
The approach taken in this paper has some similaity to the architecture proposed in 161, but is based on the Markov Random Field (MRF). The MRF provides a formal probabilistic framework so that computation can he directly embedded in a network with immunity to both device and connection failures. Since logic states are computed probabilistically the computation is also robust to the logic signal fluctuations that will arise as operation approaches the thermal limit of computation. This paper is organized as follows. In Section 2 we provide a brief overview of Markov Random Fields and the Gibbs distribution. Section 3 gives examples of how logic may he expressed using the MRF model and how this provides fault tolerance to both structural and signal based errors. Section 4 concludes the paper.
OUR PROPOSED APPROACH

The Markov Random Network
The basis for our architectural approach is the Markov random network, a network embodiment of the mathematical concept, the Markov random field (MRFJ [IO] . The Markov random field defines a set of random variables, A = {A,, Xz . . . , &}. Each variable X i can take on various values, e.g. state labels. Associated with each variahle, Xi, is a neighborhood, Ni. which is a set of variables from {A -X i } . Simply put. the probability of a given variable depends only on a (typically small) neighborhood of other / -, Conditional probability: variables. In our model, the variables represent states ofnodes in a network. The arcs or edges in the network convey the conditional probabilities with respect LO the neighboring nodes.
The definition of the MRF is as follows:
The conditional probability of a node state in terms of its neighborhood can he formulated in terms of cliques. A neighborhood and two clique examples are shown in Figure 3 . It can be shown, due to the the Hammersley-Clifford theorem [3] , that This form for the probability in equation 3 is called the Gibbs distribution. The normalizing constant 2 is called the partition function and insures that P is in the range IO, 1). The set C is the set of cliques for a given node, i. The function U , is called the clique energy function and depends only on the nodes in the clique. Note that the probability of states depends on the ratio of clique energy of the M R F to the thermal energy IcaT. For instance, the probability are uniform at high values of kbT and becomes sharply peaked at low values of kbT. This mimics the annealing behavior of physical systems. This Gibbs formulation of the Markov random field is an attractive representation for computation, since the physical interpretation of the probabilities in terms of entropy of computation is likely to find ready interpretation in the physical device characteristics.
Markov Random Field Computation
The previous discussion has suggested an approach to embedding a Markov random network in CNT circuits. We now briefly describe the nature of computation in the MRF framework. The general algorithm for finding individual site labels that maximize the probability of the overall network is called beliefpmpagation (BP) 1211 and provides an efficient means of solving inference problems by propagating marginal probabilities through the network. There are three essential probability functions: Joint probability: p(xo,x1, ' ' . , Xn-1) Marginal probability:
We can classify the nodes in the network into those that have defined label probabilities, and those whose values must he determined by the propagation algorithm. The first node type would correspond to a computational input whose value is constrained by the problem setup. Such nodes are called observable nodes. The other nodes are called hidden nodes. In a logic circuit, we can think the inpulloutput as observable nodes and the others as hidden nodes.
The basic idea of helief propagation is that the probability of state labels at a given node in the network can he determined by marginalizing (summing) over the joint probabilities for the node state given just the probabilities for site labels in the Markov neighborhood, ,G This marginalization establishes the label probabilities for the next propagation step. It can he shown that this propagation algorithm will converge to the maximum probability site label assignment for the entire network, provided there are no This incremental algorithm has computational complexity on the order of the number of nodes in the network, with a weighting term proportional to the size of the neighborhood. In the case of loops, the marginalization must be done combinatorially over a region of the network that bounds the loops in order to guarantee maximum probability solutions. That is, one would partition the network into a loop-free network of blocks which intemally contain loops. However it has been demonstrated that the belief propagation algorithm usually converges to the maximum probability state even in the presence of loops 1211. We will give examples of the belief propagation process in a later section.
loops [21] .
DESIGN EXAMPLES
So far, the type of computation to be performed by the network has not been specified. The MRF is a completely general computational framework and in principle any type of computation could be mapped onto the model. In order to concretely illustrate the operation of the model, we will use combinatorial logic as an example. The pmgramming of the MRF is straightforward in this case, and will permit some analysis of the fault tolerance of the architecture.
Combinatorial logic can be implemented using a simple, yet powerful, form for the clique energy, called the auto-model. For cliques up to order three, the energy function is given by: UJA) = It + CiEc,aiXi + Ci,jcc,/3,XiXj + C i , j , k E C , 7 i j k~i A j A k .
(4)
The constants, a,, 0.j and ?ijk are called interaction coeficienrr.
The constant K acts an energy offset. This form for Uc(A) has been used in many MRF applications including image segmentation, texture classification and object recognition [41 [17] .
There are two aspects of fault tolerance that must he considered. Our implementation does not distinguish between devices and connections. Instead. we have structure-based and signalbased faults. Nanoscale devices contain a large number of defects or structural errors, which fluctuate on time scales comparable to the computation cycle. The error will result in variation in the clique energy coefficients in the auto-model in Equation 4. The Figure 4 The logic compatibility function for an exclusive-or gate with all possible states.
valid state 000
I
signal-based type of error is directly accounted for in the probability maximization process inherent in the MRF processing. Next, we will illustrate the behavior of the model for each type of errors.
Structure-based:Errors
The effect of structure-based errors. or errors on the coefficients in the clique energy, is illustrated using an XOR example. The'logic variables are treated as real valued algebraic quantities and logic operations are transformed into arithmetic operations. Additionally, it is desired that valid inputloutput states should have lower clique energies than invalid states. Thus, the clique energy expression is obtained by a negative sum over minterms from the valid states, Table 1 : The inequalities that must hold among the energy coefficients for successful gate operation.
For our example, the set of inequalities that must hold is given in Table I . Here we relate a valid state to all possible invalid states.
For example, for valid state (XO, x1,xz) = 000 the clique energy in Eqn.6 must evaluate to a lower energy state than all possible invalid state. These relations are given in the 16 inequalities listed in Table 1 . These inequalities can be solved using a proposed algorithm similar to Gaussian elimination where a variable that appears with opposite signs in two equations can be eliminated. Applying this procedure to the inequalities in Table 1 the following constraints on the clique coefficients are obtained
The Gibhs distribution for an inverter is given by: 
2G>E
The constraints should he viewed as being driven by coefficient G which can take on any positive value. A selected value for
The bounds are linear, and so the constraints form a polytope in the space of energy coefficients. This concept is illustrated in Figure 5 where aprojection onto the D, A, B subspace is depicted. In general, the polytope will be a cone whose cross-section increases linearly with the highest order clique coefficient. The nominal values for the coefficients of Figure 5 are Ao = DO and BO = Do.
We assume a fixed error rate in the connections leads to a coefficient error proportional to its value. For example, for some error rate a. if coefficient D deviates from its nominal value by D' = Do i E, then t = UDO. The inequality relating 2 0 > A requires that.
when the worst case condition is used. Thus, a can be as large as 1/3 without causing a failure of the inequality. The constraint on the D coefficient also permits a < 113. Similar conditions arise from considering the remaining constraints. Thus for the XOR circuit, up to one third of the connections can be had and the correct logic state will still be achieved.
From this example, we observe that complex logic can he decomposed into simple designs by exploiting properties embedded in a circuit. In general, the highest order clique coefficient can he increased until the lowest order coefficient has sufficient connection redundancy to be guaranteed to attain the average error rate.
This policy guards against catastrophic failure, where a few bad connections affect a large percentage of the coefficient values. The conical structure of the constraint surface insures that this strategy is always possible.
Errors in Signal
Stare pmbabiliry
The use of a probabilistic approach to logic has the advantage that the process is inherently fault tolerant to errors in the value of logic state variables. The behavior of a simple inverter circuit will be used to illustrate this aspect of the Markov random network a p proach.
here xo is the input and XI is the output of an inverter. 2 x 0~1 -xo -ZI is the clique energy or auto-model of an inverter. The partition function 2 normalizes the expression as required for a probability. Suppose the input, 20, takes on values from {O, I}. The dependence on the input xo can be marginalized away by summing over its possible values, i.e, In the marginalization it is assumed that the input to the inverter is equally likely to be a zero or one and that the inverter has exact clique energy weights. These assumptions are somewhat idealized, since in practice the inverter will have variable clique coefficients and the input will range over a continuous set of values near zero or one according to the distribution of signal noise and device error.
The marginalized inverter output distribution function is plotted in Figure 6 for various values of ksT. energy is ten times the thermal energy, so at r w m temperature, the unit logic energy in some physical realization of the Markov network would be 0.26 electron-volts. Figure 6 The probability of an inverter output for different values of kaT.
In actual operation of a logic circuit, the input states would not be equally likely but would have higher probability of being in a given state, as required by deterministic behavior. For example, suppose the input to the inverter has p ( 1 ) = 0.7,p(O) = 0.3 then the Gibhs distribution of Figure 6 is as shown in Figure 7 . As the computing entropy increases, this probability margin asymptotically approaches zero. Based on the Maxwell's demon discussion Figure 7 : The probability of an inverter output for different values of kaT when the input is one, with probability 0.7.
in [2]. the minimum energy required for bit-operation is lebT In 2.
Similarly, a nano logic device will by necessity operate with logic energies within a few tens of kbT in order to achieve the expected reduction in power afforded by the small scale of nanodevices. For finite temperatures, the policy of choosing the output state with the highest probability always yields the correct logic operation. However, it can he expected that e m r s will result if lp(x) -0.5) is small, since any physical realization of the Markov network will have significant fluctuation of the logic levels.
Logic processing
In order to consider the error behavior of more complex circuits, it is necessary to describe the processing of logic signals through the Markov random network. This process is carried out by the chaining of conditional probabilities by bdief pmpugarion [ZI] .
As explained in Section 2.2, the probability of logic variables can he determined by summing (marginalizing) over the set of possible values of clique neighborhood states except for lhe variable in question. What remains is the probability for the single variable. This probability can be propagated to the next node in the network and used for the next summation. An example of this basic algorithm has already been given in the case of Equation 7 . The Gibbs joint and conditional probability distributions for a NAND gate are given by.
where I,,, xb are the inputs and zc is the output. Assuming independent inputs, p(x.) = p(xa) = 0.5, we can obtain the probability of a one at the output by marginalizing over all input combinations, This probability distribution is shown in Figure 8 . Note that the NAND gate is asyntmefricul in its probability distribution and for a uniform distribution of inputs, the probability of a one output state is three times that of a zero state. This should be expected, since only one input combination produces a zero output. However, this asymmetry is detrimental to logic processing as shown in Figure 9 .
Note that it is necessary to have pix.) = p(xa) > 0.7 in order to achieve any logic margin. This margin is reduced at higher input Figure 9 : The probability, p(x. = 0). for a zero output state for a NAND gate as a function of the input state probabilities, p(z.) = P ( d probabilities as the entropy increases. This result shows that logic structures should he as symmetrical as possible in order to operate close to the thermal limit of kaTln 2.
Equivalenf /ogic
To further illustrate the approach, consider the equivalent logic circuits shown in Figure IO . These circuits have identical logic functions but differ in configuration. The marginal distribution of the output state, p(xs) is shown in Figure 11 . Circuit b) has a perfectly symmetrical output distribution and thus can he expected to have significantly better failure tolerance than circuit a), even though they have the same logic function. This example demonstrates that the Markov framework can be used to optimize circuit configurations for the best signal reliability. However, it shouldbe noted that this treatment assumed perfect device operation. A complete analysis would include both device enur and signal error by integrating the device analysis from Section 3. I. Efforts are underway to carry out tius integration.
CONCLUSION
In this paper, we propose a probabilistic-based design methodology for nanoscale computer architecture. The main reason for se- Figure IO: a) The logic function z6 = z2 A (zo V z;). b) An equivalent logic circuit.
' O r 0.8 i Figure 1 I : The marginal output distributions for the circuits in Figure IO . Note that circuit a) is highly asymmetrical wlule circuit b) is perfectly symmetrical.
lecting the Markov random network, belief propagation algorithm, and Gibbs energy distribution as the basis for ournanoarchitectural approach is that its operation does not depend on perfect devices or perfect connections. In operation, the Markov network is updated by iteratively changing the state of nodes and propagating those changes through the network. Ultimately the network converges to a stable set of state probabilities reflecting the required result. A computational result is determined by selecting those visible output states that maximize the probability of their respective nodes. Successful operation only requires that the energy of correct states is lower than the energy of errors.
