The reliability of logical operations is indispensable for the reliable operation of computational systems. Since the down-sizing of micro-fabrication generates non-negligible noise in these systems, a new approach for designing noise-immune gates is required. In this paper, we demonstrate that noise-immune gates can be designed by combining Bayesian inference theory with the idea of computation over a noisy signal. To reveal their practical advantages, the performance of these gates is evaluated in comparison with a stochastic resonance-based gate proposed previously. This approach for computation is also demonstrated to be better than a conventional one that conducts information transmission and computation separately.
shown to have several advantageous properties over the gates based on the previously proposed logical SR (LSR). In addition, when the noise-level is sufficiently high, a scheme that operates computation over noisy channels ( Fig. 1 (A) ) can be better than a conventional one where information transmission and computation are conducted separately ( Fig. 1 (B) ). Finally, the generality and possible extensions of this approach are discussed.
Let x 1 (t) ∈ {0, 1} and x 2 (t) ∈ {0, 1} be two noiseless logical inputs to a logic gate at time t. The logical input x(t) = (x 1 (t), x 2 (t))
T ∈ {0, 1} 2 is generally implemented by a physical state such as voltage as U i (t) = µ i (x(t)) ∈ R for i ∈ {1, 2}. If noise in the physical inputs U (t) := (U 1 (t), U 2 (t)) T ∈ R 2 is negligible, a logical operation over x(t), e.g., AND operation x 1 (t) x 2 (t), can be implemented by a two-state switching dynamics in which the state flips only when both U 1 (t) and U 2 (t) exceed certain threshold values. However, when the noise in U (t) is sufficiently strong, such dynamics leads to erroneous switching driven by the noise.
The influence of noise in U i (t) is abstractly modeled in this work by the white Gaussian noise as U i (t)dt = µ i (x(t))dt + σ i dW i t , i ∈ {1, 2}.
Here, W i t is the one-dimensional Wiener process that represents noise with intensity σ i > 0. Furthermore, we also assume that U i (t) depends only on x i (t) as µ i (x(t)) = µ i (x i (t)). When the signal-to-noise ratio (SNR) ∆µ i /σ i is not sufficiently high, where ∆µ i := |µ i (1) − µ i (0)|, the simple threshold-based switching fails to return the correct output of, for example, the AND operation, because the noisy U 1 (t) and U 2 (t) can exceed the thresholds, even though x 1 (t) = 1 and x 2 (t) = 1 do not hold. This fact illustrates that a simple threshold-based switching does not suffice to implement a reliable logical operation under noise. To overcome this problem, we need a dynamical implementation of logical operations that is more reliable than the simple switching dynamics.
To theoretically derive such an implementation, in this work, we reformulate the logical operation under noise as a statistical inference of partial information. In the conventional statistical inference and logic operations, we infer all hidden states of x(t) from U (t) as z(t) = (z 1 (t), z 2 (t)), where z(t) is the inferred version of x(t). After inference of transmitted information, the functions of x(t) are calculated with z(t) under noiseless conditions ( Fig. 1 (B) ). However, the approach here significantly differs from the usual statistical inference in that the main purpose is the inference of partial information of x(t), i.e., a function of x(t), rather than the entire information of x(t), because a logical operation constitutes a reduction of the information on x(t) that U (t) posseses. This property enables us to conduct the necessary computation over noisy signal U (t) before inference, as shown in Fig. 1 (A) . For example, I(t) = U 1 (t)+U 2 (t) conveys sufficient information to obtain AND, NAND, OR, NOR, and XOR operations. This operation over noisy U substantially reduces the complexity of operations after inference to calculate the desired output of a gate from the inferred states of x(t). Therefore, the concept of computation over noisy signals (channels) is suitable for implementing a reliable circuit by combining unreliable and reliable components effectively, and may also be relevant for biological systems.
To demonstrate this idea, in this paper, we consider only I(t) = U 1 (t)+U 2 (t) as computation over noisy U (t), although this approach is applicable for more general situations. From the definition of U i (t) and the properties of the Wiener process, I(t) can be simplified as
The noise intensity σ 0 depends on the physical implementation of the gate. For the worst case, where U 1 and U 2 add up just before the inference computation, noise both in U 1 and U 2 contribute to σ 0 as σ
In contrast, for the best case, the noise of the single channel that transmits I(t) contributes to the noise as σ 2 . In addition, we also assume that µ 1 (x) = µ 2 (x) = µ(x) because of the symmetry of logic gates with respect to the exchange of two inputs. Thus, for sufficiently small ∆t > 0 and fixed x(t), the probability distribution for I(t) can be represented as
2 ) is the normal distribution, the mean and variances of which are ν and σ 2 , respectively. Because of its definition, ν(x t ) is 2µ(1), µ(1) + µ(0), or 2µ(0). Thus, I(t) can discriminate three of the four possible states of x t . We designate the three states denoted by χ i as χ 1 = (0, 0), χ 2 = (0, 1) or (1, 0), and χ 3 = (1, 1). Furthermore, without losing generality, we assume that µ(1) = µ/2 and µ(0) = −µ/2. Now, the inference of x(t) from the noisy input I(t) is reduced to the problem of determining know whether x t is in any of χ i . If we infer whether x(t) is in χ 3 or not, then the inference is equivalent to the AND operation, because x(t) = χ 3 only when x(t) = (1, 1). Similarly, we can construct NOR and XOR.
The statistically optimal inference of x t is derived by using the sequential Bayesian inference as in [8] . Let z i (t) := P t (x(t) = χ i |I(0 : t)) be the posterior probability that x(t) = χ i given the history of I(t ) from time t = 0 to t = t. By following the formula of Bayesian inference [11] , we have
where t = t + ∆t, and P T (t , x i |t, x j ) is the transition probability that x(t ) becomes χ i when x(t) = χ j . Then, we have
.
For simplicity, we assume that P T (t , χ i |t, χ j ) is timehomogeneous and can be represented for sufficiently small ∆t by P T (t , χ i |t, χ j ) = ∆t × r i|j for i = j and P T (t , χ i |t, χ i ) = 1 − ∆t × r i|i where r i|j is the instantaneous transition rate from χ j to χ i and r i|i = k =i r k|i holds. If the dynamics of both x 1 (t) and x 2 (t) follow a two-state Markov process whose transition rate from 0 to 1 and 1 to 0 are r on and r off , respectively, then we have
By taking the limit as ∆t → 0, we obtain a three dimensional equations with quadratic nonlinearity as
where z(t) = (z 1 (t), z 2 (t), z 3 (t)) T and
Finally, by solving the above equation with respect to z(t), we have
where • = indicates that the integrals with respect to dW t are interpreted as Stratonovich integrals [12] ( see supplementary material for the detailed derivation). The three outputs, z 1 (t), z 2 (t), and z 3 (t), correspond to Bayesian NOR, XOR and AND gates, respectively, and therefore, this system can simultaneously compute these operations. By an appropriate transformation of variables, we can also implement OR, NOR, and NAND operations. Equation (1) contains σ 0 and µ as the system's parameters, indicating that optimal tuning of these parameter such that they coincide with those of the input I(t) is necessary to conduct statistically optimal logical operations. However, the gate parameters may not be accurately adjusted in a real situation. In order to analyze the influence of a parameter mismatch, we introduce µ * 0 and σ * 0 to specifically represent the system's parameters µ and σ 0 in Eq. (1), and therefore, Eq. (1) is statistically optimal only when µ * 0 = µ and σ * 0 = σ 0 . Figures 1(C) and S1(A) demonstrate that the Bayesian gates can conduct logical operations under a very noisy condition. We compare the performance of the Bayesian gates with LSR proposed recently as a noise-immune implementation of logical gates [4, 5] . LSR is defined by the following stochastic differential equation with a doublewell potential:
where g(y) = y when y * l ≤ y ≤ y * u , g(y) = y * l when y < y * l , and g(x) = y * u when y > y *
u . An optimal LSR NOR gates can be implemented by setting (y * l , y * u ) = (−0.5, 1.3) as in [4] . In order to evaluate the performance of Bayesian and LSR NOR gates, we use an error rate (ER) defined as E :=
, where a(t) is either z i (t) or y(t). 1[a > a th ] returns 1 when a > a th and 0 otherwise. For a(t) = z i (t), we choose a th = 1/2, whereas we choose a th = 0 for a(t) = y(t) as in [4] . of LSR is shown to be optimal when 0.6 < σ s < 0.8, as in Fig. 2(A) . Under this optimal noise intensity for LSR, the ERs of LSR and Bayesian gates (BGs) are comparative, indicating that the performance of LSR is close to the statistical optimal (Figs. 1(C) and 2(A) ).
As shown in Figs. 2(A) , S1(B), and S1(C), however, the performance of LSR quickly degrades if the noise intensity of the input, σ 0 , deviates from its optimal one, whereas the BG can still conduct a reliable logical operation within a wider range of noise intensity (Fig. 2(B) ). Furthermore, the ER of the BG does not increase if σ 0 is less than the gate parameter, σ * 0 . This property of the BG means that its performance is determined by the worst noise level that σ * 0 specifies. As long as the actual noise intensity σ 0 is less than this expected worst level σ * 0 , the gate operates robustly at the cost of a fixed lower bound of the ER. Since information on the actual noise level within a system may not always be available before designing gates, the BG has a practical advantage over the LSR gate. In general, the error stems from either erroneous switching for constant x t or delay in the switching of z t when x t changes. Since the total ER can be attributed only to the delay of switching when no noise exists in the input as σ 0 = 0, we can approximately dissect the total ER E into the errors from the delay of switching at the change of x t as E D = lim σ0→0 E, (Fig.  2(C) ) and those from the erroneous switching for constant x t as E E = E − E D , (Fig. 2(D) ). As clearly seen in Fig. 2(D) , E E hardly changes if σ * 0 (expected noise intensity) is larger than σ 0 (actual noise intensity), whereas E D increases. Thus, the cost of choosing σ * 0 larger than σ 0 is the delay of switching, which limits the speed of the gate. However, σ * 0 larger than σ 0 works as a margin for systematic variation of σ 0 , because E E increases little provided that σ 0 is less than σ * 0 . The same result is obtained for Bayesian XOR and AND gates (Fig. S2) . Thus, the variation in σ 0 can be compensated at the cost of slow switching, clearly indicating a trade-off between the computational speed and the reliability of computation [13] .
Information transmission and computation are usually separated in conventional computational architectures in which computation is conducted under virtually noiseless conditions ( Fig. 1(B) ). This usual computation without noise is expected to perform better than the BGs that conducts computation and transmission simultaneously in noisy conditions. However, it requires two independent channels for input transmission ( Fig. 1 (B) ), whereas the BG combines them along the transmission path for computation ( Fig. 1 (A) ). This observation suggests that the BG may outperform the usual computation if a noisy channel is effectively exploited for computation with small q. In order to clarify this condition, we calculated E(q) of the Bayesian NOR gate as a function of q ∈ [0, 1], and that of the usual computation E U to obtain a performance ratio defined as η := E(q)/E U . Figure  3(A) shows that the operation of the BG can be more efficientl (η < 1) than or comparative (η < 10 0.1 ≈ 1.25) to the usual computation when q is sufficiently small. In addition, the range of q within which the BG operates better expands as the noise intensity σ increases. The same result is obtained for other gates (Fig. S3 ). This result indicates that the operation of the BG can be efficient when the noise in the channel is large, whereas the usual computation better when the channel noise is very small, meaning that computation over a noisy channel may be practical when the noise cannot be small.
Since our approach is based on the general theory of inference not on specific physical implementation, it potentially has more extensions and applications than those demonstrated here. First, we can choose arithmetic operations other than addition over noisy signals U 1 (t) and U 2 (t), which lead to different noise characteristics and gate properties. For example, subtraction may lead to more reliable gates than addition in principle by canceling out the correlated noise in U 1 and U 2 . Second, we can easily design noise-immune gates with more than two inputs to conduct more complicated logical operations at the cost of the complexity of individual gates. White curves represent the contours of η and the thick white curve corresponds to η = 1. The gate parameter is set to be optimal as σ * 0 = σ0. The other parameters are the same as those in Fig. 1(A) . 
