Abstract-This paper studies fault tolerance in switching reconfigurable nano-crossbar arrays. Both permanent and transient faults are taken into account by independently assigning stuck-open and stuck-closed fault probabilities into crosspoints. In the presence of permanent faults, a fast and accurate heuristic algorithm is proposed that uses the techniques of index sorting, backtracking, and row matching. The algorithm's effectiveness is demonstrated on standard benchmark circuits in terms of runtime, success rate, and accuracy. In the presence of transient faults, tolerance analysis is performed by formally and recursively determining tolerable fault positions. In this way, we are able to specify fault tolerance performances of nano-crossbars without relying on randomly generated faults that is relatively costly regarding that the number of fault distributions in a crossbar grows exponentially with the crossbar size.
I. INTRODUCTION
N ANO-CROSSBAR arrays have emerged as a strong candidate technology to replace CMOS in near future [2] , [3] . They are regular and dense structures, and fabricated by exploiting self-assembly as opposed to purely using lithography-based conventional and relatively costly CMOS fabrication techniques [4] , [5] . Currently, nano-crossbar arrays are fabricated such that each crosspoint can be used as a conventional electronic component such as a diode, an FET, or a switch [6] , [7] . This is a unique opportunity that allows us to integrate well developed conventional circuit design techniques into nano-crossbar arrays. However, as expected, the integration comes with some challenges and fault/defect tolerance is one of the significant ones. Fault rates are much higher for nano-crossbars compared to those of conventional CMOS Manuscript received February 2, 2016; revised June 1, 2016 and August 5, 2016; accepted August 13, 2016 . Date of publication August 25, 2016 ; date of current version April 19, 2017 . This work was supported in part by the EU-H2020-RISE Project NANOxCOMP under Grant 691178, and in part by the TUBITAK-Career Project under Grant 113E760. A preliminary version of this paper appeared in [1] . This paper was recommended by Associate Editor S. Pasricha.
O. Tunali is with the Department of Nanoscience and Nanoengineering, Istanbul Technical University, Istanbul 34469, Turkey (e-mail: onur.tunali@itu.edu.tr).
M. Altun is with the Department of Electronics and Communication Engineering, Istanbul Technical University, Istanbul 34469, Turkey (e-mail: altunmus@itu.edu.tr).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCAD.2016.2602804 [8] , [9] . Therefore developing efficient fault tolerance techniques for nano-crossbars is a must and the main motivation of this paper. In this paper, we examine reconfigurable crossbar arrays by considering randomly occurred stuck-open and stuck-closed crosspoint faults. This is illustrated in Fig. 1 . Our fault tolerance approach is based on an assumption that a crossbar input can be used for multiple crossbar outputs (broadcasting allowed) that fits Boolean logic applications. On the other hand, especially for memory applications a crossbar input is strictly used for only one output that necessitates different fault tolerance approaches [10] , [11] .
We propose distinct approaches for permanent and transient faults regarding their exclusive natures as shown in Table I . In the presence of permanent faults, tolerance is achieved by mapping target Boolean functions on a defective crossbar using crossbar row and column permutations. This is an NP-complete problem [12] . For the worst-case scenario, implementing a target function with an N × M crossbar requires N!M! permutations; computing time quickly grows to intractable levels with the crossbar size. To tackle this problem, several approaches have been proposed in the literature that can be classified into two main categories: 1) defect-unaware and 2) defect-aware approaches.
Defect-unaware algorithms aim to find the largest possible k × k defect-free subcrossbar from a defective N × N crossbar where k ≤ N [13] - [15] . Detailed yield analysis of these algorithms shows a common shortcoming: the algorithms are inefficient for high fault rates-obtained k values are much smaller than N [15] . When N = 250 and the fault rate is 15% that is a reasonable value for nano-arrays, the fastest algorithms find k values as high as 30 [15] . It means that only 1% of the crossbar can be used. In this regard, defect-aware algorithms perform much more satisfactorily [16] - [18] . A valid mapping is generally found using a 1.5 times larger row and column sizes than the optimal sizes. Note that for a specific target function, the larger the crossbar, the easier to find a valid mapping due to an increase in solution space. Therefore it is challenging, as well as desired for area considerations, to find a mapping with optimal size crossbars. We satisfy this with our heuristic defect-aware algorithm.
Defect-aware algorithms which use graph-based heuristics, transform the mapping problem into a graph isomorphism problem [16] , [19] , [20] . An initial input assignment is made to prune the permutation space. However, in case of an unfavorable assignment, the number of reconfigurations needed to find a valid mapping increases drastically. Additionally, the runtime quickly grows beyond practical limits, especially for large-scale target functions. Other algorithms based on integer linear programming also suffer from runtime inefficiency for large-scale functions [18] , [21] . Apart from the mentioned methods, a considerably fast memetic algorithm is proposed to tackle this problem [22] . Here the drawback is that the starting conditions affect the results significantly. As an example, experimental results presented in [22] show as large as a 25 times difference in runtimes for the same size target functions. Our proposed algorithm works considerably faster compared to the algorithms in the literature with nearly steady runtime values for the same size target functions. To our knowledge no other algorithm is able to find a valid mapping for large benchmarks such as "table5" and "t481" with up to 15% fault rates. Additionally, the proposed algorithm shows 99% accuracy in accordance with the results of an exhaustive search algorithm.
Our algorithm performs sorting to avoid disadvantageous initial appointments and reduce unnecessary reconfigurations. For this purpose, matrix and index representations of target functions and defective crossbars are obtained. Sorted matrices are matched using 1-D array matchings that makes the mapping problem to be solved with mere multiplication operations. Backtracking is also performed to improve accuracy.
Although permanent fault tolerance of nano-crossbar arrays have been thoroughly studied in the literature, transient faults are not adequately emphasized. Redundancy-based approaches are proposed to tolerate transient faults by exploiting techniques including majority voting, hardening, and fault masking [17] , [23] - [26] . For these studies, the main goal is to find an efficient method of adding extra redundancies to correct/detect single or multiple faults while optimizing the area overhead. In this paper, we do not aim to correct faults; instead we aim to determine tolerable fault positions in advance without increasing area. We adopt a formal approach instead of randomly generating faults and checking whether the faults ruin the crossbar functionality. We determine equivalent logic functions of a target function that denotes the positions of tolerable faulty switches. We show that iff faults occur on these positions, the crossbar still implements the correct function. In other words, we show that it is possible to tolerate transient faults without adding extra redundancies. In this way, we are able to specify fault tolerance performance without relying on a Monte Carlo simulation that is relatively costly regarding that the number of fault distributions in a crossbar grows exponentially with the crossbar size.
Our method can be used for the above mentioned studies to manipulate redundancies using the obtained tolerable fault positions. Additionally, the obtained equivalent Boolean functions can be used generally for logic equivalence problems.
Organization of this paper is as follows. In Section II, we present the proposed fault tolerance algorithm for permanent faults. In Section III, we explain transient faults, their reliability analysis, and eventually a performance calculation method. In Section IV, we present experimental results and elaborate on them. In Section V, we discuss our contributions and future works.
A. Definitions
In this section, we explain key concepts used throughout this paper for both permanent and transient faults.
Definition 1: Consider k independent Boolean variables, x 1 , x 2 , . . . , x k . Boolean literals are Boolean variables and their complements, i.e.,
Definition 2: A product (P) is an AND of literals, e.g., P = x 1x3 x 4 . A sum-of-products (SOPs) expression is an OR of products.
Definition 3: A prime implicant (PI) of a Boolean function f is a product that implies f such that removing any literal from the product results in a new product that does not imply f . 4 . A product-of-sums (POSs) expression is an AND of sums.
Definition 6: Function matrix (FM) is a representation of a Boolean function in SOP form such that the function's literals and products are appointed to the matrix columns and rows, respectively. If a literal occurs in a product, it is denoted with +1; otherwise −1 is assigned. Fig. 2(a) shows an example of an FM.
Definition 7: Crossbar matrix (CM) is a representation of a crossbar array such that functional switches of crossbars are denoted with 0; defective stuck-closed and stuck-open switches are denoted with +1 and −1, respectively. Fig. 2(b) shows an example of a CM by considering stuck-closed and stuck-open faults.
Definition 8: Logic inclusion ratio (IR) is defined as a ratio of the number of +1s, corresponding to used switches, to the total number of elements, +1s and −1s, in an FM. As an example, consider the FM in Fig. 2(a) . Here, the number of +1s or the number of used switches is 6, so IR = 6/15.
II. PERMANENT FAULT TOLERANCE
We aim to find out a valid mapping, namely a correct assignment of literals and products of a target function to inputs and outputs of a given crossbar having permanent faults. Positions of the faults are known, represented by a CM, prior to mapping. We consider randomly distributed stuck-closed and stuck-open faults at crosspoint switches; wire breakdowns and bridging faults are not considered in this paper.
In case of having a defect-free crossbar, every assignment produces a valid mapping. Fig. 3(a) shows two different assignments resulting in valid mappings for a target function f . However, finding a valid mapping for a defective crossbar requires trials of different assignments. This is illustrated in Fig. 3(b) . While the assignment in the upper part produces an incorrect mapping since x 1 of P 1 is positioned on a stuck-open fault, the assignment in the lower part is correct resulting in a valid mapping. The main purpose of our algorithm is to find a correct assignment or a valid mapping; a formal problem definition is given as follows.
Problem Definition: Consider different assignments of literals (xs) to inputs and products (Ps) to outputs. An input array I[x i , . . . , x j ] and an output array O[P i , . . . , P j ] are defined such that ith elements of the arrays are the assigned literal and product to the ith crossbar input and output, respectively. The proposed algorithm yields input and output arrays that establish a valid mapping or a correct assignment. As an example, the correct assignment in the lower part of Fig. 3(a) has
Our algorithm fundamentally uses index representations of function and crossbar matrices as well as row/column permutations and matchings. These concepts are explained as follows. 
A. Preliminaries 1) Row Index:
The number of +1, 0, or −1 valued elements in a matrix row. For example, the row represented by P 1 in Fig. 4 has a row index of 3 for a chosen value of +1.
2) Column Index: The number +1, 0, or −1 valued elements in a matrix column. For example, the column represented by x 1 in Fig. 4 has a column index of 1 for a chosen value of −1.
3) Row Index Set: A set of all row indices of a matrix for a chosen value of +1, 0, or −1. In Fig. 4 , rows represented Hadamard product of row matrices represented by P 1 and O 1 . The resulting matrix has no negative element; there is a valid matching.
by P 1 , P 2 , and P 3 have row indices of 1, 2, and 2, respectively, for a chosen value of −1. So its row index set is I R,F = {1, 2, 2}, where R stands for row and F stands for function.
4) Column Index Set:
A set of all column indices of a matrix for a chosen value of +1, 0, or −1. In Fig. 4 , columns represented by x 1 , x 2 , x 3 , and x 4 have column indices of 2, 2, 1, and 2, respectively, for a chosen value of +1. So its column index set is I C,F = {2, 2, 1, 2}, where C stands for column and F stands for function.
5) Row/Column Permutation: In order to find a valid mapping, defective switches of a CM which are denoted as +1s (stuck-closed) and −1s (stuck-open) must be matched with +1s (used) and −1s (unused), respectively, in an FM. Here, an important property is that row and column permutations in the FM do not alter the implemented function. This is an important reconfigurability feature for fault tolerance as illustrated in Fig. 4 .
6) Row Matching With Hadamard Product:
In order to match two rows from function and crossbar matrices, we use Hadamard product by performing element-by-element multiplication that is similar to an inner product operation used for vectors. If there is any negative valued element in the resulting matrix then there is no matching; otherwise there is a valid matching. Note that functional switches (denoted with 0) in the CM can be always matched with either +1s or −1s in the FM. However, +1s and −1s in the CM can only be matched with +1s and −1s in the FM, respectively. This is illustrated in Table II . Additionally, Fig. 5 shows an example for a valid matching between the first rows of the matrices in case of having stuck-closed and stuck-open faults. 
B. Proposed Algorithm
The outline of our four-step algorithm is shown in Fig. 6 .
Step 1 starts with obtaining index sets of function and crossbar matrices. Using the sets, crossbar matrices are sorted according to either stuck-closed (+1) or stuck-open (−1) faults such that rows and columns with the most defective elements are aligned to the top and the left sides, respectively. Function matrices are sorted in the same manner as shown in Fig. 7 . Using sorted matrices significantly reduce the matching workload in the next step. Note that although we treat stuck-closed and stuckopen faults separately throughout this paper, our algorithm works properly in case having both fault types in crossbars.
Step 2 performs row by row matching between the sorted matrices advancing from top to bottom. For the matched matrices, the number of columns is always less than or equal to the number of rows. In case, a function or a CM does not satisfy this, it is transposed. The reason of this operation is to decrease the number of trials in step 4.
If an FM row cannot be matched with any of the unmatched CM rows then the algorithm proceeds to step 3. Fig. 8 illustrates an example; numbers in red assigned to the CM rows represent the orders of the corresponding matched rows in the FM. Every row of the FM until the 14th row R 14 is matched with a row in the CM. Since R 14 cannot be matched with any of the unmatched rows, backtracking starts by checking the previously matched crossbar rows from top to bottom. This results in a matching with the fourth row followed by performing step 2 by excluding the matched rows. Note that after backtracking R 2 becomes unmatched and is to be matched with the unmatched CM rows. This prevents a recursive character that would cause a significant computational load.
In case backtracking does not result in a valid matching, the algorithm proceeds to step 4 with repeating step 2 (and step 3) at most permutation limit (PL) times. Here, column permutations are randomly applied. Note that step 4 is used as a contingency plan to maintain certain performance metrics including accuracy and success rate (Psucc). Accordingly, the value of PL is determined. In this paper, we aim to maintain minimum of 95% success rate. For this purpose, we randomly generate function and crossbar matrices for different crossbar sizes with a fault rate of 15% that is an accepted upper limit for nano-crossbars [27] and an IR of 40% that is a typical average value for benchmark functions. The results using optimal size crossbars and 1.5 larger sizes than the optimal ones are given in Fig. 9 (a) and (b), respectively. Both graphs clearly show a steep increase after PL exceeds 2000. It means that selecting PL considerably larger than 2000 does slightly improve the success rate of the algorithm while it would increase the runtime significantly. We select PL = 3000 in this paper. Indeed, our algorithm proceeds to step 4 only for very small portion of benchmark simulations that are thoroughly explained in Section IV.
Since permutations are performed column wise, we expect much stronger relation of PL with the number of columns M compared to the number of rows N. The relation between PL and M can be relatively examined with the following probability analysis. Consider function and CM rows to be matched. In case of having stuck-closed faults with a fault probability of p f , probability of having a valid matching between these rows can be found as
where a = p f · M and b = IR · M represent expected values for the number of 1s in crossbar and function rows, respectively. Additionally, probability of having a valid matching after performing a pairwise permutation (initially no matching) can be found as
By considering constant IR and p f values, we can comment that: 1) increasing M makes Pr p decrease; 2) decreasing Pr p reduces the effectiveness of performing a permutation; 3) PL is negatively correlated with Pr p ; and 4) if Pr p decreases to relatively small levels then increasing PL would not significantly contribute in finding a valid matching that is also verified by the results in Fig. 9 .
A pseudo code of the proposed heuristic algorithm is depicted in Algorithm 1. The algorithm yields input and output arrays that establish a valid mapping or a correct assignment of a target function into a defective crossbar.
C. Performance Evaluation
Our algorithm uses a constant permutation for 1-D (column) and advancing through the other one (row) that reduces the number of operations for finding a valid mapping [20] , [23] . Instead of using conventional 2-D matchings of matrices, our algorithm performs considerably faster 1-D matrix row matchings. Our motivation is that the main problem of mapping target functions has many different solutions. Therefore probable information lost in 1-D check can be easily compensated; backtracking and repeating is also for this purpose. Here, an important factor is the relation between logic IR and fault rate. For a constant IR around 40%, a typical average value for standard benchmark functions, an increase in the fault rate especially beyond 25% significantly reduces the number of mapping solutions that worsens the performance of our algorithm. For fault rates below 25%, our algorithm works satisfactorily in terms of both runtime and accuracy with surpassing related algorithms in the literature. Our algorithm's performance is also justified with a complexity analysis as follows and detailed experimental results in Section IV.
Consider 
III. TRANSIENT FAULT TOLERANCE
Regarding the probabilistic and the continuous feature of transient faults in time domain, their tolerance cannot be achieved by applying the same technique used for permanent faults that is based on fault identification followed by reconfiguration. Transient fault tolerance is purely based on to minimize the number of used switches for cost optimization in fabrication. We analyze fault tolerance performance of nano-crossbar arrays by considering the specifics of target functions. Fig. 10 shows an example. A given target function f in ISOP form is implemented with a fault-free crossbar shown in Fig. 10(a) . When a stuck-open fault occurs on a used switch (denoted with +1s) as shown in Fig. 10(b) , the corresponding literal is erased from the target function and the corresponding matrix element becomes −1. In this example, since the new function f is not equal to the original function f , the fault cannot be tolerated. When a stuck-closed fault occurs on an unused switch (denoted with −1s) as shown in Fig. 10(c) , the corresponding literal is added to the target function and the corresponding matrix element becomes +1. Here, the new function f is equal to f , so the fault is tolerated.
A. Stuck-Open Faults
Stuck-open faults are tolerated iff they occur on unused switches. Faults on used switches change the implemented functions. Since we use ISOP forms of target functions consisting of PIs, by definition removing any literal from a PI results in a new function. Fault tolerance performance FT so of an N × M crossbar can be directly calculated by using
where p so is an independent stuck-open fault probability of each switch and IR is the logic IR. Note that our analysis for stuck-open faults is applicable for both single-output and multioutput functions. 
B. Stuck-Closed Faults
We show that along with all stuck-closed faults occurring on used switches, faults on unused switches can also be tolerated. This is illustrated in Fig. 11 with a brief summary of our tolerance analysis method. We determine all possible positions of tolerable faults on unused switches in the crossbar. These positions, represented by added +1s in red in Fig. 11 , are determined recursively. First, tolerable fault positions in single rows are determined. For the example in Fig. 11 , among five rows representing five products of the target function, three of them have the positions. Therefore there are three matrices showing tolerable fault positions. Analyzing the first matrix at the upper-left corner, we conclude that a stuck-closed fault in the first row at the right end of the crossbar can be tolerated; f = x 1 x 2 x 3 + x 1 x 2 x 5 + x 2 x 3 + x 3 x 4 + x 4 x 5 = f . The same is valid for the second and the third matrices as well. Next, we determine tolerable fault positions simultaneously occurring in all of the three rows. For the example in Fig. 11 , there is no solution for this case, so we proceed to next steps by decreasing the number of rows that the faults are seen until there is a solution. Among In order to find all possible positions of tolerable faults, we exploit logic equivalences of Boolean expressions. Consider a given target function f = P 1 + · · · + P m in ISOP form. Stuck-closed faults on unused switches add literals to the corresponding products that results in a new function named f t . Our main purpose is finding all f t s such that f t = f . Two examples of f t s corresponding to the top two matrices in Fig. 11 are f t 1 = x 1 x 2 x 3 + x 1 x 2 x 5 + x 2 x 3 + x 3 x 4 + x 4 x 5 and f t 2 = x 1 x 2 + x 1 x 2 x 5 + x 1 x 2 x 3 + x 3 x 4 + x 4 x 5 . Added products of literals, shown in red, are named as P t i s where i represents the corresponding product number. As an example, f t 1 has P t 1 = x 3 ; f t 2 has P t 3 = x 1 . A general form of f t can be represented as f t {i,...,k} = P 1 + · · · + P i P t i + · · · + P k P t k + · · · + P m where the subscript of f , {i, . . . , k} set shows which products have added literals.
Our method for finding all f t {i,...,k} = f s has two steps. In the first step, we determine tolerable fault positions affecting single products. We obtain all f t {i} s and corresponding P t i s, 1 ≤ i ≤ m for which a necessary and sufficient condition is given in Theorem 1. In the second step, we first construct an f t such that it has all P t i s obtained in the first step. If the f t is equal to the target function f then we are done with finding all tolerable fault positions; no further steps are necessary as justified by Theorem 2. If the functions are not equal to each other then we advance through decrementing the number of products affected by faults. We repeat this until the equivalence(s) are satisfied.
As a core property used in the theorems, we first present the following lemma.
Lemma 1:
Proof: It is apparent that
Theorem 1: Consider a function g i = f − P i in ISOP form (P i is excluded from f ). Iff P t i consists of negated forms of single-literal products in g i (
is a POS expression with sums having either single literal or multi literals. Single-literal sums are negated forms of single-literal products in g i (P i = 1). To eliminate multiliteral sums from P i g i (P i = 1), we can directly apply Lemma 1 with guaranteeing f = f t {i} . To prove sufficiency, we also show that each literal from P t i should correspond to a negated form of a single-literal product in g i (P i = 1). Consider a literal l i from P t i . From Lemma 1, we know that f = P i l i +g i . Since f (P i = 1) = 1, l i + g i (P i = 1) = 1. This necessitates having a product l i in g i (P i = 1) in ISOP form.
Theorem 2: If f t {i,...,k} = f , then for ∀x ⊂ {i, . . . , k}, f t x = f . Proof: The proof is a direct corollary of Lemma 1 from which we know that we can remove any literal (s) from P t i s without disturbing the equivalence with f .
Theorem 1 allows us to separately construct P t i s showing tolerable fault positions for each P i . Additionally, removing a literal from P t i s does not ruin the functionality as justified by Lemma 1 that are considered in our fault tolerance analysis.
Theorem 2 significantly reduces the computing load of finding tolerable fault positions. For example, if we find for a target function f that f t 3, 4, 8, 9 = f , then all tolerable fault combinations affecting products of P 3 , P 4 , P 8 , and P 9 are known. For example, f t 3,8,9 = f or f t 4,9 = f .
We present an example to elucidate our method.
Example 1: Consider a target function in ISOP form
Step 1: We find faults affecting single products by exploiting Theorem 1. We only consider literals being member of LS
P t 4 : not a member of LS.
Step 2: We first check whether f equals to f t {1,3} having P t 1 , P t 3 . We start with P t 3 having the largest number of literals
Since f = f t {1,3} , Theorem 2 ensures that P t 3 = x 2 and P t 3 = x 5 also makes f = f t {1,3} . Additionally, f = f t {1,3} = f t {1} = f t {3} . Note that our fault tolerance calculations consider all possible literal combinations of P t s. As a result, all tolerable stuckclosed fault positions are found.
Fault tolerance performance FT sc of an N × M crossbar can be calculated by using
where p sc is an independent stuck-closed fault probability of each switch; C i is the number of cases tolerating i faults; and AL i is the number of added literals to the function f representing the number of faulty switches; and Z = N·M·(1−IR). Note that Z−AL i represents the number of unused switches in crossbars. Note that C 0 represents a fault-free condition and always C 0 = 1. For Example 1, N = 4, M = 7, and IR = 10/28 that results in Z = 18. Additionally C 1 = 3, C 2 = 3, and C 3 = 1, and suppose that p sc = 2%. As a result, FT sc is calculated as 74%.
C. Fault Tolerance for Multioutput Functions
Although we develop our method for stuck-closed faults using single-output functions, we can directly apply it to multioutput functions. We only need a modification for the first step of our method, obtaining all P t i s. First, we need to obtain all P t i s for each output function separately. If a product is used by multiple outputs then only common P t i s for this product are used. If a product is used by a single output then we use all of the corresponding P t i s. After having P t i s in the first step, we follow the same procedure as we do in the second step of our method developed for single-output functions. To elucidate our method for multioutput functions, we present an example.
Example 2: Considering target functions in ISOP form
Implementation is shown in Fig. 12 . LS of f 1 and f 2 is LS =
P t 4 : no single literal. Since P 1 and P 2 are common products, we should choose common P t s for these products that are P t 1 = x 3 and P t 2 = x 2 , so the tolerance condition is met for both functions.
Step 2: We first check whether f 1 equals to f 1,t {1,2}
Since f 1,t {1,2} = f 2 and no more products left, we stop.
We check whether f 2 equals to f 2,t {1,2}
Since f 2,t {1,2} = f 2 and no more products left, we stop.
For the above example, N = 6, M = 8, and IR = 16/48 that results in Z = 32. Additionally, C 1 = 2 and suppose that p sc = 2%. As a result, FT sc is calculated as 54%.
D. Performance Evaluation
Our method finds all probable places of tolerable stuck-open and stuck-closed transient faults occurring in nano-crossbars. Using our method transient fault tolerance performances of the crossbars can be also calculated. As opposed to the methods using randomly assigned faults on crossbars such as a Monte Carlo method, our method purely uses algebraic equations to find fault performances. This allows to achieve accurate results even for considerably large crossbars. Table III shows fault tolerance performances FT so and FT sc for few benchmark functions with a fault probability of 5%.
For stuck-open faults, since it is not possible to tolerate faults occurring on used switches, the performance is directly calculated using the logic IR and the crossbar size. However, for stuck-closed faults there are some cases such that faults on unused switches are tolerated. Table III shows results derived by neglecting these cases (direct results) and by considering them via the proposed method (accurate results); there is as high as 9% difference between the values.
Our method is applicable to both single-output and multioutput functions as justified in the previous section. Another important consideration is redundancy. Although in this paper, we suppose that target functions are implemented in ISOPs forms to minimize the number of used switches for cost optimization in fabrication, this is not a necessary condition to apply our tolerance method. In case of having redundancy in literal level with addition of literals to products, by keeping the number of products same, our method is directly applicable to find all possible positions of tolerable faults in the crossbar. We only need to have an ISOP form of the given expression in SOP form. Indeed, adding a literal to a PI is the base of our method for stuck-closed faults. Here, the difference comes in the calculation of fault tolerance performances FT so and FT sc ; given formulas in the previous section need to be updated that would result in an increase and decrease in FT so and FT sc values, respectively.
In case of having redundancy in product level, having multiple lines/wires implementing the same product (as a PI), our method can be directly applicable for stuck-open faults including the calculation of FT so since removing any literal from a PI results in a new function. However, for suck-closed faults we need modifications especially for Theorem 1. Here, if a product P i is implemented A times then for each of the A wires, we need to calculate P t i s by considering negated forms of products having at most A literals in g i (P i = 1). The calculation of FT sc should be also changed accordingly. One can also consider redundancy both in literal and product levels. Let us explain this with an example using different implementations with different redundancies.
Example 3: Consider a target function in ISOP form f = x 1 x 2 x 3 + x 2 x 4 x 5 + x 3 x 4 + x 3 x 5 that is the same function used in Example 1. Consider different implementations of f using different types of redundancies in Fig. 13 . Fig. 13(a) shows an implementation of f with literal level redundancy by a 4 × 7 crossbar. Assume that we have a 5% stuck-open fault rate. Tolerable cases become no fault with (1 − 0.05) 12 = 54% probability, single fault with 2 × (1 − 0.05) 11 0.05 1 = 5% probability, and two faults with (1 − 0.05) 10 0.05 2 = 0.1% probability. At the end FT so = 54% + 5% + 0.1% ≈ 60%. For stuck-closed faults, we already determine the tolerable positions in Example 1 as P t 1 = x 5 , P t 3 = x 2 x 5 , and their literal combinations. It is shown in Fig. 13 (a) that P t 1 = x 5 and P t 3 = x 5 are covered by literal redundancies, so only tolerable fault is P t 3 = x 2 . In this case, N = 4 and M = 7 that results in Z = 16. Additionally C 1 = 1, and suppose that p sc = 2%. As a result, FT sc is calculated as 73%. Fig. 13(b) shows an implementation of f with product level redundancy by a 5 × 7 crossbar. Even though a redundant product is used, we are still working with PIs. So no literal can be erased from any product. Therefore, with a 5% stuck-open fault rate, FT so becomes (1−0.05) 13 = 51%. For stuck-closed faults, it is shown in Fig. 13(b) that an extra tolerable fault x 5 comes from the product redundancy, so P t 1 = x 5 , P t 3 = x 2 x 5 , and P t 1 = x 5 . Calculating all literal combinations with N = 5 and M = 7 results in Z = 22. Additionally, C 1 = 4, C 2 = 6, C 3 = 4, and C 4 = 1. Also suppose that p sc = 2%. As a result, FT sc is calculated as 69%. Fig. 13(c) shows an implementation of f with literal and product level redundancies by a 5 × 7 crossbar. Assume that we have a 5% stuck-open fault rate. Tolerable cases become no fault with (1 − 0.05) 14 = 48% probability and single fault with (1 − 0.05) 13 0.05 1 = 2% probability. At the end, FT so = 48% + 2% = 50%. For stuck-closed faults, P t 3 = x 5 is covered by a literal redundancy, so P t 1 = x 5 , P t 1 = x 5 , and P t 3 = x 2 . In this case, N = 5 and M = 7, that results in Additionally, C 1 = 3, C 2 = 3, and C 3 = 1. Also suppose that p sc = 2%. As a result, FT sc is calculated as 69%.
IV. EXPERIMENTAL RESULTS
In this section, we present experimental results for our algorithm dealing with permanent faults given in Section II. We use standard benchmark circuits to measure fault tolerance performances of nano-crossbars [28] . We mostly consider an independent fault probability/rate (Pf) of 15% for each crosspoint that is an accepted upper limit for nano-crossbars [27] . We also try higher fault rates to test our algorithm's performance limits. Simulations are conducted in MATLAB. Crossbars with random faults are produced with MATLAB's predetermined matrix generator; only stuck-open faults are considered for consistency. All experiments run on a 3.30 GHz Intel Core i5 CPU (only single core used) with 4 GB memory. All the benchmark functions used in the simulations and the source code of proposed algorithm with supporting material are available at http://www.ecc.itu.edu.tr/images/f/f2/Fault_Tolerant_Logic_ Mapping_MATLAB.zip.
A. Runtime, Success Rate, and Accuracy
For a given target function with a certain FM size, we consider crossbar matrices both in optimal row-column sizes and in 1.5 times larger sizes. Although optimal crossbar sizes are desired for area considerations, it is quite challenging to find a mapping and that is why using 1.5 larger sizes are preferred in [16] - [18] and [22] . The larger the crossbar, the easier to find a valid mapping due to an exponential increase in solution space regarding the number of probable permutations. Table IV shows runtime and success rate values of the proposed algorithm for benchmark circuits with 15% stuck-open fault rate. We select a sample size of 600 around which average runtime and success rate (probability of success-Psucc) values become steady. Success rate is calculated as a ratio of the number of samples with valid mappings/matchings to the total sample size of 600. As seen from the table, our algorithm successfully finds mappings for considerably large benchmark circuits. To our knowledge no other algorithm is able to find a valid mapping for benchmarks table5 and t481. Examining the numbers in Table IV , we see that our algorithm does not need a permutation for 1.5 larger crossbars. We also see that although selecting 1.5 larger crossbars always reduces the runtime values, it does not necessarily result in better fault tolerance performances. Optimal size crossbars can also perfectly tolerate faults. To elaborate on this, we perform accuracy analysis as shown in Fig. 14 . We compare our optimal size mapping results with those of an exhaustive search algorithm. Since it is intractable to implement an exhaustive search for crossbar sizes larger than 7 × 7, only results pertaining to this limit are presented in Fig. 14 that show an accuracy of atleast 99% for eight different benchmarks BM1-BM8.
In Tables V and VI , runtime comparisons of the memetic algorithm with fitness approximation [22] and the proposed heuristic algorithm are given. We use the memetic algorithm since to our knowledge it is the fastest and the most efficient algorithm especially for large crossbars. We run the publicly posted code from [22] and tailor it for our benchmark functions which is not included in the referenced paper.
Examining the numbers in Tables V and VI , we see that our runtime values are always better than those of the memetic algorithm. The memetic algorithm is not able to find a valid mapping for large functions such as 9sao, table5, and t481 under a reasonable time constraint. Additionally, while runtime values of the memetic algorithm for large benchmark circuits produce relatively high standard deviation, our runtimes are almost stable. Another aspect is that, the memetic algorithm is not as immune to an increase in fault rate as the proposed algorithm does.
B. Effectiveness and Limitations
In our algorithm if no matching is found initially, column permutations are changed to find a matching that is repeated at most PL times. Experimentally we found that PL = 3000 for our benchmarks. The reason of selecting 3000 as a trial limit is our goal of maintaining minimum of 95% success rate. Indeed, for most cases repeating is not necessitated. Especially for 1.5 larger crossbar sizes, no permutation is needed at all; all results with having nonzero success rates in Tables IV-VI do not need any a permutation (PL = 0). However, for optimal sizes, we sometimes need permutations. Fig. 15 illustrates this by presenting the number of permutations for different benchmark circuits using 50 samples.
We explore our algorithm's performance limitations by increasing fault rates and row/column sizes. The limitations are directly correlated to the size of the solution space. As expected, the solution space diminishes if fault rates are getting close to IR and 1-IR in the presence of stuck-closed, and stuck-open faults, respectively. This is illustrated in Fig. 16 for stuck-open faults using 1.5 times larger crossbars. Here, success rates drop sharply after certain threshold values that are positively correlated with 1-IR values of the benchmarks.
Increasing row or column sizes also affect the solution space. Recall that our algorithm uses a constant permutation for 1-D (column) and advancing through the other one (row) that reduces the number of operations for finding a valid mapping. Therefore, while increasing row sizes does not directly affect the solution space for matchings, an increase in column size dramatically reduces it. To overcome this problem, Fig. 15 . Number of permutations to find a valid mapping for each sample using optimal size crossbars. our algorithm transposes given matrices to satisfy that the number of columns is always less than or equal to the number of rows. To see the effects of column and row increases to our algorithm, we discard transposing operation. The results are given in Fig. 17 for stuck-open faults using 1.5 times larger crossbars and IR = 0.4. As it appears from the figure, the runtime sharply increases from 0.002 to 1.2 s if the crossbar size increases from 48×30 to 48×42. As a result, for the same size crossbars, same N·M, our algorithm works more satisfactorily if the crossbar column and row sizes are more apart from each other.
Another limitation of our algorithm would be its accuracy in case of having a small solution space. Indeed, this is a general problem for heuristic algorithms. To overcome this problem, exact algorithms exploiting a subgraph isomorphism can be used [29] if runtime is not a main concern. In addition, a slower algorithm using pruning techniques can be exploited [30] .
V. CONCLUSION
In this paper, we propose a fast heuristic algorithm to tolerate permanent faults in nano-crossbar arrays by exploiting the techniques of index sorting, backtracking, and row matching. The algorithm's effectiveness is demonstrated on standard benchmark circuits in comparison with the related studies in the literature. Also we develop a method to accurately analyze transient fault tolerance of nano-crossbar arrays. The method formally and recursively finds tolerable fault positions represented by Boolean logic expressions. Using the method, transient fault tolerance performances of the crossbars can be calculated.
Throughout this paper, we treat stuck-closed and stuck-open faults separately. Indeed, for permanent faults our algorithm works properly in case having both fault types in crossbars. Matrices are sorted according to stuck-closed and stuck-open faults in case of having a higher stuck-closed and stuck-open fault rates, respectively. However, the efficiency of the algorithm would not be satisfactory if we have close fault rates. This is considered as a future work. Another future direction is to develop circuit design and optimization techniques for given fault tolerance specifications by simultaneously treating permanent and transient faults. We also aim to extend this paper to be applicable for different emerging technologies including magnetic and memristive switch-based nanoarrays.
